-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve VulnerabilityReference
semantics
#9091
Conversation
cb36eb6
to
26d0401
Compare
VulnerabilityReference
semantics
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #9091 +/- ##
=========================================
Coverage 67.08% 67.08%
+ Complexity 1188 1187 -1
=========================================
Files 240 240
Lines 7901 7901
Branches 915 914 -1
=========================================
Hits 5300 5300
- Misses 2232 2233 +1
+ Partials 369 368 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
26d0401
to
720d359
Compare
29d3513
to
78670fb
Compare
I will make the necessary changes to once this is merged. |
// The ORT and OSV vulnerability data models are different in that ORT uses a severity for each reference (assuming | ||
// that different references could use different severities), whereas OSV manages severities and references on the | ||
// same level, which means it is not possible to identify whether a reference belongs to a specific severity. | ||
// To map between these different model, simply use the "cartesian product" to create an ORT reference for each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commit-message: Please explain why the carthesian product is an improvement?
Have you considered to simply define a priority on the scoring algorithm, and pick the first WRT. to that priority?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain why the carthesian product is an improvement?
Done.
Have you considered to simply define a priority on the scoring algorithm, and pick the first WRT. to that priority?
No, as I believe such a prioritization of the data to maintain should be done downstream, e.g. by reporter implementations. In the data model itself I prefer to maintain as much (non-reconstructable) data as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, as I believe such a prioritization of the data to maintain should be done downstream, e.g. by reporter implementations. In the data model itself I prefer to maintain as much (non-reconstructable) data as possible.
I'm just trying to understand, what could be the benefit in keeping e.g. a HIGH, MEDIUM, LOW value, when e.g. a CVSS V3 vector is available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what could be the benefit in keeping e.g. a HIGH, MEDIUM, LOW value, when e.g. a CVSS V3 vector is available?
Ease of use. E.g. a reporter implementation would then not need to depend on a library like the CVSS Calculator to calculate a human-readable severity rating from a vector. Same goes for the (base) score.
In essence, if the upstream API of an advisor also provides separate fields for these, IMO it's a good indication that it makes sense to maintain these in ORT as well. During my changes, I found that no advisor provides only the vector, though you could theoretically re-calculate all data from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ease of use. E.g. a reporter implementation would then not need to depend on a library like the CVSS Calculator to calculate a human-readable severity rating from a vector. Same goes for the (base) score.
I presume that code which parses an OrtResult is using ORT as a library anyway. So, adding a function for doing the conversion would do as well, instead of adding redundancy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If information was guaranteed to at least be consistent, but this isn't the case.
Could you please elaborate what you're referring to here? Where is data inconsistent currently?
It's not just the redundancy in the severity, but the reference now get duplicated 1x per severity, so references become redundant as well.
I'm not sure whether "redundant" is strictly the right term here. Here's an example from OsvFunTest
about what OSV reports:
severity = {LinkedHashSet@8317} size = 2
0 = {Severity@9729} Severity(type=CVSS_V3, score=CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:H/A:N)
1 = {Severity@9730} Severity(type=CVSS_V4, score=CVSS:4.0/AV:N/AC:L/AT:N/PR:L/UI:N/VC:N/VI:H/VA:N/SC:N/SI:N/SA:N)
severity {java.util.Set}
affected = {LinkedHashSet@8771} size = 3
affected {java.util.Set}
references = {LinkedHashSet@8772} size = 3
0 = {Reference@9734} Reference(type=ADVISORY, url=https://nvd.nist.gov/vuln/detail/CVE-2022-29946)
1 = {Reference@9735} Reference(type=ADVISORY, url=https://github.com/advisories/GHSA-2h2x-8hh2-mfq8)
2 = {Reference@9736} Reference(type=WEB, url=https://github.com/nats-io/advisories/blob/main/CVE/CVE-2022-29946.txt)
So with OSV, you lose the information about which of the references defines a CVSS_V3
and / or CVSS_V4
score. Also, the original implementation would have only maintained the first CVSS_V3
score, losing details about the longer CVSS_V4
vector.
The new implementation maintains all url
fields from references for each of CVSS_V3
and CVSS_V4
. As I assume policy rules to by mostly written against a certain scoring system like CVSS_V4
, it's actually a benefit to still have all URLs for that scoring system available.
We could nevertheless at some point decide to change ORT's data model here to match OSV's. But such a decision would be a different topic that's out of scope of this PR, IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please elaborate what you're referring to here? Where is data inconsistent currently?
I assumed that severity[]
from OSV contained exactly the de-duplicated severities corresponding to each of the references
, and further assumed that these might not necessarily be identical semantic wise. Maybe this assumption is wrong and severity
only contains multiple representations of the same severity? (Then it should be fine)
I'm not sure whether "redundant" is strictly the right term here. Here's an example from OsvFunTest about what OSV reports:
As I assume policy rules to by mostly written against a certain scoring system like CVSS_V4, it's actually a benefit to still have all URLs for that scoring system available.
The policy rules IIUC have to match 6 references, while the input was just 2 severities. I assumed that policy rules given a vulnerability would do as follows:
- Determine the most accurate representation of the severity amongst the references
- Evaluated the policy against that single severity
This is why I thought, if all other representations were not relevant to the policy, why add them to the data model if we end up with 6 entries of severities where just 2 are given, while it could be reduced to just 1. That's maybe a bit simpler. Without reduction, entries could be kept same amount by doing the following which you mentioned already:
We could nevertheless at some point decide to change ORT's data model here to match OSV's. But such a decision would be a different topic that's out of scope of this PR, IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this assumption is wrong and severity only contains multiple representations of the same severity? (Then it should be fine)
That's my understanding, yes. So the same vulnerability is once described via CVSS_V3, and once via CVSS_V4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am surprised that this change does not affect any test results which would demonstrate the outcome?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is that the only other severity representation added by this change is a CVSS_V4 representation which is not supported by the used CVSS Calculator library yet and thus gets "swallowed".
78670fb
to
575a75b
Compare
// The ORT and OSV vulnerability data models are different in that ORT uses a severity for each reference (assuming | ||
// that different references could use different severities), whereas OSV manages severities and references on the | ||
// same level, which means it is not possible to identify whether a reference belongs to a specific severity. | ||
// To map between these different model, simply use the "cartesian product" to create an ORT reference for each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, maybe all in all I just didn't find the redundancy so nice. If information was guaranteed to at least be consistent, but this isn't the case.
It's not just the redundancy in the severity, but the reference now get duplicated 1x per severity, so references become redundant as well. I'm not worried about the size of the result at all, but just about the redundancy itself, because the policy rules then will need to deal with that redundancy: handle inconsistency and do de-duplication of references / checks.
Anyhow, it seems opinionated. I'm not going to block this PR for this point.
575a75b
to
08c6bce
Compare
model/src/main/kotlin/vulnerabilities/VulnerabilityReference.kt
Outdated
Show resolved
Hide resolved
model/src/main/kotlin/vulnerabilities/VulnerabilityReference.kt
Outdated
Show resolved
Hide resolved
// The ORT and OSV vulnerability data models are different in that ORT uses a severity for each reference (assuming | ||
// that different references could use different severities), whereas OSV manages severities and references on the | ||
// same level, which means it is not possible to identify whether a reference belongs to a specific severity. | ||
// To map between these different model, simply use the "cartesian product" to create an ORT reference for each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am surprised that this change does not affect any test results which would demonstrate the outcome?
This is a preparation for directly dealing with OSV enum names as scoring systems in an upcoming change. Signed-off-by: Sebastian Schuberth <[email protected]>
As of VulnerableCode 31.0.0 [1] there is a `scoring_elements` field in the response which contains the CVSS vector. Prepare for making use of it by adding it to the data model. [1]: https://github.com/aboutcode-org/vulnerablecode/blob/main/CHANGELOG.rst#version-v3100 Signed-off-by: Sebastian Schuberth <[email protected]>
08c6bce
to
0f8a161
Compare
Signed-off-by: Sebastian Schuberth <[email protected]>
This is a preparation for exclusively using the `getQualitativeRating()` function going forward. Signed-off-by: Sebastian Schuberth <[email protected]>
bd1b1d1
to
ebcb66e
Compare
3174d5f
to
b0b0021
Compare
Previously, the semantics of the `severity` property were too loaded as the string could have also represented a numeric value. Introduce a dedicated `score` for the latter and clarify that `severity` from now on is supposed to exclusively contain a semantic rating like "LOW" or "HIGH". Additionally, start to maintain the individual metrics as usually encoded into a vector string. This allows for more sophisticated visualizations e.g. via [1]. Adjust tests accordingly and make some smaller improvements along the way. [1]: https://github.com/org-metaeffekt/metaeffekt-universal-cvss-calculator Signed-off-by: Sebastian Schuberth <[email protected]>
In particular, do not throw on CVSS version 4 vectors. Signed-off-by: Sebastian Schuberth <[email protected]>
The original comment and implementation was misleading when saying "only one representation is actually possible currently" as it left unclear whether that refers to a limitation in the OSV or ORT data model. In any case, it is possible to maintain all OSV `severity` data by creating ORT vulnerability references for all combinations of severities and references. Do that to not lose any information about possible alternative severity type / score pairs. Signed-off-by: Sebastian Schuberth <[email protected]>
Only look for prefixes when matching scoring system names to properly recognize e.g. "cvssv3.1_qr" as a `Cvss3Rating`. Signed-off-by: Sebastian Schuberth <[email protected]>
518bc6c
to
ea95f4b
Compare
Merging despite the unrelated Python test failures. |
See [1]. [1]: oss-review-toolkit/ort#9091 Signed-off-by: Sebastian Schuberth <[email protected]>
See [1]. [1]: oss-review-toolkit/ort#9091 Signed-off-by: Sebastian Schuberth <[email protected]>
See [1]. [1]: oss-review-toolkit/ort#9091 Signed-off-by: Sebastian Schuberth <[email protected]>
|
See [1]. [1]: oss-review-toolkit/ort#9091 Signed-off-by: Sebastian Schuberth <[email protected]>
Please have a look at the individual commit messages for the details.
Note
This breaking change requires vulnerability rules to be adjusted as shown here.