feat(output): use new internal data structure #1609

G-Rath · 2025-02-12T00:49:59Z

This is an initial attempt to convert the table output to use the new Output.Result and co that is being used for container scanning.

Overall, I think I'm actually pretty close, but I'm going to put it down for now to focus on slog integration

G-Rath · 2025-02-12T00:53:26Z

internal/output/__snapshots__/table_test.snap

@@ -1024,12 +1024,12 @@
 ╭───────────────────────┬──────┬───────────┬───────────────┬─────────┬──────── ≈
 │ OSV URL               │ CVSS │ ECOSYSTEM │ PACKAGE       │ VERSION │ SOURCE  ≈


I think this has revealed that our test cases are structured in a way that is not expected by the new data format - we've got at least one case where a models.PackageSource has packages from different ecosystems, which results in those packages getting associated with a single ecosystem due to output.BuildResults assuming all packages in a source will belong to the same ecosystem

@hogo6002 would you mind reviewing the "multiple sources with a mixed count of packages across ecosystems, and multiple vulnerabilities" case (among others) to confirm if its valid?

I assumed that if all packages originate from a single source file, they belong to the same ecosystem. For example, a go.mod file can only contain Go packages. For the test here, when we mock data, and we assign the same source path to packages from different ecosystems. I think the tests are more incorrect here. (is it possible define Nuget and Packagist packages in one lockfile?)

In the extracting code, we determine the package ecosystem based on the extractor used. So I don't think we will encounter a scenario where a single lockfile contains packages from different ecosystems. But I can also easily modify the outputResult code to handle this if we think this type of case is possible.

@another-rex what do you think Rex

Personally I think we should be prepared to support it unless it's extremely hard, because it allows support for arbitrary "files" - though that handling could be done here or at the parser level.

First though, my example case: maybe one day we support an arbitrary CSV or JSON file, meaning I can say to the scanner "here is a single file that has multiple packages from different ecosystems".

On the one hand, that's a single file so technically it's from the same source and thus could end up here as a single source - on the other hand, we could decide to handle that at the extractor level i.e. we return a source for each ecosystem that all point to the same file.

While we've talked enough in the past about CSV support to know that that specifically is unlikely, but I still think its an easy example of how in future maybe some kind of file will come along that we do want to support.

Related, is this not actually a situation that's already possible with SBOM? can that not include packages across multiple ecosystems?

I can also easily modify the outputResult code to handle this if we think this type of case is possible.

Again, I think if it's very easy then it's probably worth doing just so we don't have to every think about it again 🤷

I actually don't know about how we get ecosystems for packages in SBOM. I can just modify the OutputResult code to add support for this case, it's just a few lines code change.

G-Rath · 2025-02-12T00:55:23Z

(I think we might also have a sort bug or two, but first I want to confirm if the test cases are actually valid since that'll be a huge source of difference either way)

hogo6002 · 2025-02-12T02:01:59Z

internal/output/output_result.go

@@ -53,6 +53,8 @@ type PackageResult struct {
 	HiddenVulns      []VulnResult
 	LayerDetail      PackageContainerInfo
 	VulnCount        VulnCount
+	Path             string   `json:"-"`


Is the package path same as SourceReuslt.Name

yeah looks like it probably is - I was just trying to get things working enough that the snapshots weren't completely bogus before I started refining things.

fwiw, I do think the fields could do with some documentation to make it easier to find things - for example, whats the difference between RegularVulns and HiddenVulns? and why is OSPackageNames a thing - when should I use that instead of Name? etc

I'll add more comments to clarify.
RegularVulns refers to the called vulnerabilities, and HiddenVulns are those we've filtered out for users. These hidden vulnerabilities might be unimportant (for OS packages), or they could be uncalled (source packages). The OSPackageName field exists because OS packages have both source name and package name (the binary file name). For example, the source package krb5 on Debian might result in binary package names like krb5-localesand libkrb5-3. For project scanning, I think the package.name is enough

Added an issue to track this.

hogo6002 · 2025-02-12T02:37:14Z

internal/output/__snapshots__/table_test.snap

@@ -1024,12 +1024,12 @@
 ╭───────────────────────┬──────┬───────────┬───────────────┬─────────┬──────── ≈
 │ OSV URL               │ CVSS │ ECOSYSTEM │ PACKAGE       │ VERSION │ SOURCE  ≈


I assumed that if all packages originate from a single source file, they belong to the same ecosystem. For example, a go.mod file can only contain Go packages. For the test here, when we mock data, and we assign the same source path to packages from different ecosystems. I think the tests are more incorrect here. (is it possible define Nuget and Packagist packages in one lockfile?)

In the extracting code, we determine the package ecosystem based on the extractor used. So I don't think we will encounter a scenario where a single lockfile contains packages from different ecosystems. But I can also easily modify the outputResult code to handle this if we think this type of case is possible.

G-Rath changed the title ~~Output/use new struct~~ feat(output): use new internal data structure Feb 12, 2025

G-Rath commented Feb 12, 2025

View reviewed changes

G-Rath mentioned this pull request Feb 12, 2025

fix(output): ensure that vulnerabilities are sorted by ID across groups in table output #1598

Open

hogo6002 reviewed Feb 12, 2025

View reviewed changes

G-Rath added 2 commits February 18, 2025 14:13

feat(output): use new internal data structure

92634c2

test: update snapshots

ab9e54d

G-Rath force-pushed the output/use-new-struct branch from e5bf907 to ab9e54d Compare February 18, 2025 01:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(output): use new internal data structure #1609

feat(output): use new internal data structure #1609

G-Rath commented Feb 12, 2025

G-Rath Feb 12, 2025

hogo6002 Feb 12, 2025

hogo6002 Feb 12, 2025

G-Rath Feb 12, 2025

hogo6002 Feb 12, 2025

G-Rath commented Feb 12, 2025

hogo6002 Feb 12, 2025

G-Rath Feb 12, 2025

hogo6002 Feb 12, 2025

hogo6002 Feb 12, 2025

hogo6002 Feb 12, 2025

		@@ -1024,12 +1024,12 @@
		╭───────────────────────┬──────┬───────────┬───────────────┬─────────┬──────── ≈
		│ OSV URL │ CVSS │ ECOSYSTEM │ PACKAGE │ VERSION │ SOURCE ≈

feat(output): use new internal data structure #1609

Are you sure you want to change the base?

feat(output): use new internal data structure #1609

Conversation

G-Rath commented Feb 12, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

G-Rath commented Feb 12, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment