Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(output): use new internal data structure #1609

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

G-Rath
Copy link
Collaborator

@G-Rath G-Rath commented Feb 12, 2025

This is an initial attempt to convert the table output to use the new Output.Result and co that is being used for container scanning.

Overall, I think I'm actually pretty close, but I'm going to put it down for now to focus on slog integration

@G-Rath G-Rath changed the title Output/use new struct feat(output): use new internal data structure Feb 12, 2025
@@ -1024,12 +1024,12 @@
╭───────────────────────┬──────┬───────────┬───────────────┬─────────┬──────── ≈
│ OSV URL │ CVSS │ ECOSYSTEM │ PACKAGE │ VERSION │ SOURCE ≈
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this has revealed that our test cases are structured in a way that is not expected by the new data format - we've got at least one case where a models.PackageSource has packages from different ecosystems, which results in those packages getting associated with a single ecosystem due to output.BuildResults assuming all packages in a source will belong to the same ecosystem

@hogo6002 would you mind reviewing the "multiple sources with a mixed count of packages across ecosystems, and multiple vulnerabilities" case (among others) to confirm if its valid?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that if all packages originate from a single source file, they belong to the same ecosystem. For example, a go.mod file can only contain Go packages. For the test here, when we mock data, and we assign the same source path to packages from different ecosystems. I think the tests are more incorrect here. (is it possible define Nuget and Packagist packages in one lockfile?)

In the extracting code, we determine the package ecosystem based on the extractor used. So I don't think we will encounter a scenario where a single lockfile contains packages from different ecosystems. But I can also easily modify the outputResult code to handle this if we think this type of case is possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@another-rex what do you think Rex

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I think we should be prepared to support it unless it's extremely hard, because it allows support for arbitrary "files" - though that handling could be done here or at the parser level.

First though, my example case: maybe one day we support an arbitrary CSV or JSON file, meaning I can say to the scanner "here is a single file that has multiple packages from different ecosystems".

On the one hand, that's a single file so technically it's from the same source and thus could end up here as a single source - on the other hand, we could decide to handle that at the extractor level i.e. we return a source for each ecosystem that all point to the same file.

While we've talked enough in the past about CSV support to know that that specifically is unlikely, but I still think its an easy example of how in future maybe some kind of file will come along that we do want to support.

Related, is this not actually a situation that's already possible with SBOM? can that not include packages across multiple ecosystems?

I can also easily modify the outputResult code to handle this if we think this type of case is possible.

Again, I think if it's very easy then it's probably worth doing just so we don't have to every think about it again 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't know about how we get ecosystems for packages in SBOM. I can just modify the OutputResult code to add support for this case, it's just a few lines code change.

@G-Rath
Copy link
Collaborator Author

G-Rath commented Feb 12, 2025

(I think we might also have a sort bug or two, but first I want to confirm if the test cases are actually valid since that'll be a huge source of difference either way)

@@ -53,6 +53,8 @@ type PackageResult struct {
HiddenVulns []VulnResult
LayerDetail PackageContainerInfo
VulnCount VulnCount
Path string `json:"-"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the package path same as SourceReuslt.Name

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah looks like it probably is - I was just trying to get things working enough that the snapshots weren't completely bogus before I started refining things.

fwiw, I do think the fields could do with some documentation to make it easier to find things - for example, whats the difference between RegularVulns and HiddenVulns? and why is OSPackageNames a thing - when should I use that instead of Name? etc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add more comments to clarify.
RegularVulns refers to the called vulnerabilities, and HiddenVulns are those we've filtered out for users. These hidden vulnerabilities might be unimportant (for OS packages), or they could be uncalled (source packages). The OSPackageName field exists because OS packages have both source name and package name (the binary file name). For example, the source package krb5 on Debian might result in binary package names like krb5-localesand libkrb5-3. For project scanning, I think the package.name is enough

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an issue to track this.

@@ -1024,12 +1024,12 @@
╭───────────────────────┬──────┬───────────┬───────────────┬─────────┬──────── ≈
│ OSV URL │ CVSS │ ECOSYSTEM │ PACKAGE │ VERSION │ SOURCE ≈
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that if all packages originate from a single source file, they belong to the same ecosystem. For example, a go.mod file can only contain Go packages. For the test here, when we mock data, and we assign the same source path to packages from different ecosystems. I think the tests are more incorrect here. (is it possible define Nuget and Packagist packages in one lockfile?)

In the extracting code, we determine the package ecosystem based on the extractor used. So I don't think we will encounter a scenario where a single lockfile contains packages from different ecosystems. But I can also easily modify the outputResult code to handle this if we think this type of case is possible.

@G-Rath G-Rath force-pushed the output/use-new-struct branch from e5bf907 to ab9e54d Compare February 18, 2025 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants