feat(combine-to-osv): support intelligent range merging#5459
feat(combine-to-osv): support intelligent range merging#5459jess-lowe wants to merge 13 commits into
Conversation
| If ranges are not simple enough to merge boundaries, we select the best range using the following hierarchy: | ||
| 1. **Fixed Priority**: A range with bounded `fixed` version or commit information is prioritized over a range with open-ended `last_affected` information. | ||
| 2. **Constrained Range Priority**: We prefer ranges that define a specific non-zero `introduced` bound over those that start at `"0"`. | ||
| 3. **CPE_RANGE Source Priority**: We prefer ranges whose metadata source is `"CPE_RANGE"` because they are extracted from explicit config nodes rather than inferred from text. |
There was a problem hiding this comment.
Should this be below 1 & 2? Seems like CPE ranges are better than a random introduced inferred from the description, or maybe not?
| * Duplicate entries inside `extracted_events` are removed. | ||
|
|
||
| ### Last-Affected Cleanup | ||
| At the end of the selection or merging process, if the final range contains at least one explicit `fixed` commit or version event, any `last_affected` events are automatically removed from the range to maintain clean, bounded schema compliance. |
There was a problem hiding this comment.
Hmm still not sure I agree with this. This will mostly be enumerated versions right, feels like we should still keep those somehow. Not 100% sure how though. The versions: [] array maybe?
There was a problem hiding this comment.
Nah this happens a lot more frequently than enumerated versions, and is against the schema. Ill handle enumerated versions in the future.
This is also something we've done historically with the NVD conversion.
There was a problem hiding this comment.
Are there cases where nvd or CVE have multiple ranges each? Or cases where both CVE and NVD will have full (introduced-fixed) ranges each. These doesn't seem to be covered in the test cases.
This PR refactors the range merging and NVD version extraction logic in
vulnfeedsto produce more consolidated OSV records. It unifies repository ranges under a singleAffectedblock, and implements a robust range-merging priority hierarchy (pickBestRange).Affectedpackage object in the output OSV record, significantly improving structure cleanliness.Intelligent Range Selection & Merging (
pickBestRange)fixedevents are prioritized over open-endedlast_affectedranges.introducedcommit or version are preferred over genericintroduced: "0"ranges.CPE_RANGESource Priority: Ranges derived from explicit CPE ranges are preferred over other sources."REFERENCES"are now merged (appended) directly into the CVE range rather than discarded.database_specificmetadata maps (includingsourcevalues andextracted_events) are combined and deduplicated.last_affectedentries from the chosen or merged range if afixedevent or commit exists.Also included a DESIGN.md file with the explanations for the flow of data