Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve VulnerabilityReference semantics #9091

Merged
merged 8 commits into from
Sep 18, 2024
Merged

Conversation

sschuberth
Copy link
Member

@sschuberth sschuberth commented Sep 6, 2024

Please have a look at the individual commit messages for the details.

Note

This breaking change requires vulnerability rules to be adjusted as shown here.

@sschuberth sschuberth force-pushed the vulnerability-vectors branch from cb36eb6 to 26d0401 Compare September 13, 2024 17:00
@sschuberth sschuberth changed the title Vulnerability vectors Improve VulnerabilityReference semantics Sep 13, 2024
@sschuberth sschuberth marked this pull request as ready for review September 13, 2024 17:05
@sschuberth sschuberth requested a review from a team as a code owner September 13, 2024 17:05
@sschuberth sschuberth enabled auto-merge (rebase) September 13, 2024 17:05
Copy link

codecov bot commented Sep 13, 2024

Codecov Report

Attention: Patch coverage is 58.33333% with 5 lines in your changes missing coverage. Please review.

Project coverage is 67.08%. Comparing base (9f7b625) to head (ea95f4b).
Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
...n/kotlin/vulnerabilities/VulnerabilityReference.kt 44.44% 1 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##               main    #9091   +/-   ##
=========================================
  Coverage     67.08%   67.08%           
+ Complexity     1188     1187    -1     
=========================================
  Files           240      240           
  Lines          7901     7901           
  Branches        915      914    -1     
=========================================
  Hits           5300     5300           
- Misses         2232     2233    +1     
+ Partials        369      368    -1     
Flag Coverage Δ
funTest-non-docker 34.49% <0.00%> (ø)
test 36.60% <58.33%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sschuberth sschuberth force-pushed the vulnerability-vectors branch from 26d0401 to 720d359 Compare September 13, 2024 18:38
@sschuberth sschuberth force-pushed the vulnerability-vectors branch 2 times, most recently from 29d3513 to 78670fb Compare September 13, 2024 19:28
@sschuberth
Copy link
Member Author

// The ORT and OSV vulnerability data models are different in that ORT uses a severity for each reference (assuming
// that different references could use different severities), whereas OSV manages severities and references on the
// same level, which means it is not possible to identify whether a reference belongs to a specific severity.
// To map between these different model, simply use the "cartesian product" to create an ORT reference for each
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit-message: Please explain why the carthesian product is an improvement?

Have you considered to simply define a priority on the scoring algorithm, and pick the first WRT. to that priority?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain why the carthesian product is an improvement?

Done.

Have you considered to simply define a priority on the scoring algorithm, and pick the first WRT. to that priority?

No, as I believe such a prioritization of the data to maintain should be done downstream, e.g. by reporter implementations. In the data model itself I prefer to maintain as much (non-reconstructable) data as possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, as I believe such a prioritization of the data to maintain should be done downstream, e.g. by reporter implementations. In the data model itself I prefer to maintain as much (non-reconstructable) data as possible.

I'm just trying to understand, what could be the benefit in keeping e.g. a HIGH, MEDIUM, LOW value, when e.g. a CVSS V3 vector is available?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what could be the benefit in keeping e.g. a HIGH, MEDIUM, LOW value, when e.g. a CVSS V3 vector is available?

Ease of use. E.g. a reporter implementation would then not need to depend on a library like the CVSS Calculator to calculate a human-readable severity rating from a vector. Same goes for the (base) score.

In essence, if the upstream API of an advisor also provides separate fields for these, IMO it's a good indication that it makes sense to maintain these in ORT as well. During my changes, I found that no advisor provides only the vector, though you could theoretically re-calculate all data from it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ease of use. E.g. a reporter implementation would then not need to depend on a library like the CVSS Calculator to calculate a human-readable severity rating from a vector. Same goes for the (base) score.

I presume that code which parses an OrtResult is using ORT as a library anyway. So, adding a function for doing the conversion would do as well, instead of adding redundancy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If information was guaranteed to at least be consistent, but this isn't the case.

Could you please elaborate what you're referring to here? Where is data inconsistent currently?

It's not just the redundancy in the severity, but the reference now get duplicated 1x per severity, so references become redundant as well.

I'm not sure whether "redundant" is strictly the right term here. Here's an example from OsvFunTest about what OSV reports:

severity = {LinkedHashSet@8317}  size = 2
 0 = {Severity@9729} Severity(type=CVSS_V3, score=CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:H/A:N)
 1 = {Severity@9730} Severity(type=CVSS_V4, score=CVSS:4.0/AV:N/AC:L/AT:N/PR:L/UI:N/VC:N/VI:H/VA:N/SC:N/SI:N/SA:N)
severity {java.util.Set} 
affected = {LinkedHashSet@8771}  size = 3
affected {java.util.Set} 
references = {LinkedHashSet@8772}  size = 3
 0 = {Reference@9734} Reference(type=ADVISORY, url=https://nvd.nist.gov/vuln/detail/CVE-2022-29946)
 1 = {Reference@9735} Reference(type=ADVISORY, url=https://github.com/advisories/GHSA-2h2x-8hh2-mfq8)
 2 = {Reference@9736} Reference(type=WEB, url=https://github.com/nats-io/advisories/blob/main/CVE/CVE-2022-29946.txt)

So with OSV, you lose the information about which of the references defines a CVSS_V3 and / or CVSS_V4 score. Also, the original implementation would have only maintained the first CVSS_V3 score, losing details about the longer CVSS_V4 vector.

The new implementation maintains all url fields from references for each of CVSS_V3 and CVSS_V4. As I assume policy rules to by mostly written against a certain scoring system like CVSS_V4, it's actually a benefit to still have all URLs for that scoring system available.

We could nevertheless at some point decide to change ORT's data model here to match OSV's. But such a decision would be a different topic that's out of scope of this PR, IMO.

Copy link
Member

@fviernau fviernau Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please elaborate what you're referring to here? Where is data inconsistent currently?

I assumed that severity[] from OSV contained exactly the de-duplicated severities corresponding to each of the references, and further assumed that these might not necessarily be identical semantic wise. Maybe this assumption is wrong and severity only contains multiple representations of the same severity? (Then it should be fine)

I'm not sure whether "redundant" is strictly the right term here. Here's an example from OsvFunTest about what OSV reports:

As I assume policy rules to by mostly written against a certain scoring system like CVSS_V4, it's actually a benefit to still have all URLs for that scoring system available.

The policy rules IIUC have to match 6 references, while the input was just 2 severities. I assumed that policy rules given a vulnerability would do as follows:

  1. Determine the most accurate representation of the severity amongst the references
  2. Evaluated the policy against that single severity

This is why I thought, if all other representations were not relevant to the policy, why add them to the data model if we end up with 6 entries of severities where just 2 are given, while it could be reduced to just 1. That's maybe a bit simpler. Without reduction, entries could be kept same amount by doing the following which you mentioned already:

We could nevertheless at some point decide to change ORT's data model here to match OSV's. But such a decision would be a different topic that's out of scope of this PR, IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this assumption is wrong and severity only contains multiple representations of the same severity? (Then it should be fine)

That's my understanding, yes. So the same vulnerability is once described via CVSS_V3, and once via CVSS_V4.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised that this change does not affect any test results which would demonstrate the outcome?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that the only other severity representation added by this change is a CVSS_V4 representation which is not supported by the used CVSS Calculator library yet and thus gets "swallowed".

plugins/advisors/osv/src/main/kotlin/Osv.kt Show resolved Hide resolved
// The ORT and OSV vulnerability data models are different in that ORT uses a severity for each reference (assuming
// that different references could use different severities), whereas OSV manages severities and references on the
// same level, which means it is not possible to identify whether a reference belongs to a specific severity.
// To map between these different model, simply use the "cartesian product" to create an ORT reference for each
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, maybe all in all I just didn't find the redundancy so nice. If information was guaranteed to at least be consistent, but this isn't the case.

It's not just the redundancy in the severity, but the reference now get duplicated 1x per severity, so references become redundant as well. I'm not worried about the size of the result at all, but just about the redundancy itself, because the policy rules then will need to deal with that redundancy: handle inconsistency and do de-duplication of references / checks.

Anyhow, it seems opinionated. I'm not going to block this PR for this point.

evaluator/src/main/kotlin/PackageRule.kt Outdated Show resolved Hide resolved
examples/example.rules.kts Outdated Show resolved Hide resolved
// The ORT and OSV vulnerability data models are different in that ORT uses a severity for each reference (assuming
// that different references could use different severities), whereas OSV manages severities and references on the
// same level, which means it is not possible to identify whether a reference belongs to a specific severity.
// To map between these different model, simply use the "cartesian product" to create an ORT reference for each
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised that this change does not affect any test results which would demonstrate the outcome?

This is a preparation for directly dealing with OSV enum names as
scoring systems in an upcoming change.

Signed-off-by: Sebastian Schuberth <[email protected]>
As of VulnerableCode 31.0.0 [1] there is a `scoring_elements` field in
the response which contains the CVSS vector. Prepare for making use of
it by adding it to the data model.

[1]: https://github.com/aboutcode-org/vulnerablecode/blob/main/CHANGELOG.rst#version-v3100

Signed-off-by: Sebastian Schuberth <[email protected]>
@sschuberth sschuberth force-pushed the vulnerability-vectors branch from 08c6bce to 0f8a161 Compare September 17, 2024 09:19
This is a preparation for exclusively using the `getQualitativeRating()`
function going forward.

Signed-off-by: Sebastian Schuberth <[email protected]>
@sschuberth sschuberth force-pushed the vulnerability-vectors branch 2 times, most recently from bd1b1d1 to ebcb66e Compare September 17, 2024 15:36
@sschuberth sschuberth force-pushed the vulnerability-vectors branch 2 times, most recently from 3174d5f to b0b0021 Compare September 17, 2024 15:53
Previously, the semantics of the `severity` property were too loaded as
the string could have also represented a numeric value. Introduce a
dedicated `score` for the latter and clarify that `severity` from now on
is supposed to exclusively contain a semantic rating like "LOW" or "HIGH".

Additionally, start to maintain the individual metrics as usually encoded
into a vector string. This allows for more sophisticated visualizations
e.g. via [1].

Adjust tests accordingly and make some smaller improvements along the way.

[1]: https://github.com/org-metaeffekt/metaeffekt-universal-cvss-calculator

Signed-off-by: Sebastian Schuberth <[email protected]>
In particular, do not throw on CVSS version 4 vectors.

Signed-off-by: Sebastian Schuberth <[email protected]>
The original comment and implementation was misleading when saying "only
one representation is actually possible currently" as it left unclear
whether that refers to a limitation in the OSV or ORT data model. In any
case, it is possible to maintain all OSV `severity` data by creating ORT
vulnerability references for all combinations of severities and
references. Do that to not lose any information about possible alternative
severity type / score pairs.

Signed-off-by: Sebastian Schuberth <[email protected]>
Only look for prefixes when matching scoring system names to properly
recognize e.g. "cvssv3.1_qr" as a `Cvss3Rating`.

Signed-off-by: Sebastian Schuberth <[email protected]>
@sschuberth sschuberth force-pushed the vulnerability-vectors branch from 518bc6c to ea95f4b Compare September 17, 2024 16:43
@sschuberth sschuberth disabled auto-merge September 18, 2024 06:26
@sschuberth
Copy link
Member Author

Merging despite the unrelated Python test failures.

@sschuberth sschuberth merged commit 01ca824 into main Sep 18, 2024
20 of 22 checks passed
@sschuberth sschuberth deleted the vulnerability-vectors branch September 18, 2024 06:27
sschuberth added a commit to oss-review-toolkit/ort-config that referenced this pull request Sep 18, 2024
sschuberth added a commit to oss-review-toolkit/ort-config that referenced this pull request Sep 18, 2024
sschuberth added a commit to oss-review-toolkit/ort-config that referenced this pull request Sep 18, 2024
@sschuberth
Copy link
Member Author

sschuberth added a commit to oss-review-toolkit/ort-config that referenced this pull request Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants