Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Ingesting SBOMs results in license error #2127

Closed
nathannaveen opened this issue Sep 11, 2024 · 11 comments
Closed

[bug] Ingesting SBOMs results in license error #2127

nathannaveen opened this issue Sep 11, 2024 · 11 comments
Labels
bug Something isn't working

Comments

@nathannaveen
Copy link
Contributor

Describe the bug

When ingesting some SBOMs we sometimes encounter the error:

{"level":"error","ts":1726087820.935968,"caller":"collector/collector.go:108","msg":"emit error: unable to ingest document: error assembling graphs for \"file:///cdx_guac.json\" : ingestLicenses failed with error: IngestLicenses failed with error: input: ingestLicenses LicenseRef name provided without inline.\n","guac-version":"v0.0.1-custom","documentHash":"sha256_37a12cc083d82a18fe3d342cb2f0121279c7c94e55860225d694caf8c5c4c85d","stacktrace":"github.com/guacsec/guac/pkg/handler/collector.Collect\n\t/Users/nathannaveen/go/src/github.com/nathannaveen/guac/pkg/handler/collector/collector.go:108\ngithub.com/guacsec/guac/cmd/guacone/cmd.init.func7\n\t/Users/nathannaveen/go/src/github.com/nathannaveen/guac/cmd/guacone/cmd/files.go:151\ngithub.com/spf13/cobra.(*Command).execute\n\t/Users/nathannaveen/go/pkg/mod/github.com/spf13/[email protected]/command.go:989\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/Users/nathannaveen/go/pkg/mod/github.com/spf13/[email protected]/command.go:1117\ngithub.com/spf13/cobra.(*Command).Execute\n\t/Users/nathannaveen/go/pkg/mod/github.com/spf13/[email protected]/command.go:1041\ngithub.com/guacsec/guac/cmd/guacone/cmd.Execute\n\t/Users/nathannaveen/go/src/github.com/nathannaveen/guac/cmd/guacone/cmd/root.go:57\nmain.main\n\t/Users/nathannaveen/go/src/github.com/nathannaveen/guac/cmd/guacone/main.go:23\nruntime.main\n\t/Users/nathannaveen/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:271"}

This happens with these two SBOMs:

cdx_guac.json
cdx_vuln.json

To Reproduce
Steps to reproduce the behavior:

  1. Start up the GraphQL: go run ./cmd/guacgql --gql-debug
  2. Then run the ingestion command go run ./cmd/guacone collect files cdx_guac.json

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
image

GUAC version

GUAC version: v0.8.2

@nathannaveen nathannaveen added the bug Something isn't working label Sep 11, 2024
@nathannaveen
Copy link
Contributor Author

@jeffmendoza and I were discussing this and have come to realize the issue. When ingesting an SPDX SBOM the licenses can be either defined via an id or a name. If the name is provided, the inline will have to be attached. But, in CycloneDX docs, when the name is provided, the inline is optional https://cyclonedx.org/docs/1.6/json/#services_items_licenses_oneOf_i0_items_license_name.

If the inline isn't provided, then we can't actually know if it is a valid license. So, we should probably not add a license node if the license is defined via a name and the inline isn't provided.

@semmet95
Copy link
Contributor

semmet95 commented Sep 23, 2024

Hey @nathannaveen
I ran into the same issue and dived deeper after checking your comment. It seems for licenses that have neither the text field (this is what I assume "inline" means), nor the bom-ref field, the CycloneDX parser just returns hash of the license name without adding any corresponding inline entry for it in the parser.
Would it make sense to remove the else block to fall back to the license ID for these cases?
The approach above wouldn't work for licenses that only have the name, so no id field to switch to.

UPDATE:: for licenses that only have the name field, should we treat the name field the same as the id field or should we skip their ingestion altogether?

@pxp928
Copy link
Collaborator

pxp928 commented Sep 23, 2024

If the inline isn't provided, then we can't actually know if it is a valid license. So, we should probably not add a license node if the license is defined via a name and the inline isn't provided.

@jeffmendoza Based on the discussion, we do not add a license node but do we still keep them part of the license expression strings?

@funnelfiasco
Copy link
Contributor

If the inline isn't provided, then we can't actually know if it is a valid license. So, we should probably not add a license node if the license is defined via a name and the inline isn't provided.

I'm not sure I agree with that. My inclination is to trust what software providers say the license of a software package is unless there's evidence to the contrary. FOSS projects in particularly are prone to incomplete or ambiguous expression of licenses, so we have to work with what we have. The ClearlyDefined support helps here because it allows the user to find places where the declared and detected licenses don't match (of course, CD is also an incomplete data set).

There's also the question of "what is a valid license?" If I released some software under the Permissive 3000 license that I wrote (translated?), it's not in SPDX nor is it OSI-approved, but it's certainly a valid license in the sense that it does what licenses are supposed to do.

@pxp928
Copy link
Collaborator

pxp928 commented Sep 23, 2024

@semmet95 Based on maintainer call:

If the inline isn't provided, then we can't actually know if it is a valid license. So, we should NOT add a license node if the license is defined via a name and the inline isn't provided but but we DO still keep them part of the license expression

@funnelfiasco
Copy link
Contributor

I retract my earlier comment after discussion in today's Maintainer Meeting. I misunderstood the shape of the problem, but I'm going to leave the comment there for posterity because there are some points worth preserving. My main concern with ignoring it is if there are several packages that use "Ben's Cool License", people won't be able to search for it. But that's going to be ambiguous and edge-case-y enough to not try to solve right now.

@pombredanne
Copy link
Contributor

Have you considered using https://scancode-licensedb.aboutcode.org/ as reference dataset? This is the largest such db this side of the galactic quadrant. (And unfortunately dumbed down in ClearlyDefined scan merges when reported by ScanCode there)

@pxp928
Copy link
Collaborator

pxp928 commented Sep 27, 2024

@pombredanne ah very interesting! This would be great to integrate into GUAC as another data point for licenses

@pombredanne
Copy link
Contributor

@funnelfiasco re:

There's also the question of "what is a valid license?" If I released some software under the Permissive 3000 license that I wrote (translated?), it's not in SPDX nor is it OSI-approved, but it's certainly a valid license in the sense that it does what licenses are supposed to do.

If this exists and is used, this is a valid license alright in my book. OSI and SPDX have nothing to do with the existence of a license, they are rather helping clarify and catrgorize these licenses.

As far the Permissive 3000 license, this is a fine license but it sees little actual usage, so we track it only as the text of a generic, permissive license in ScanCode at https://github.com/aboutcode-org/scancode-toolkit/blame/9a340fc36b971bcc04fdf255ee73bf88ce39635a/src/licensedcode/data/rules/other-permissive_210.RULE#L12 ... most occurrences in the wild are found in ScanCode forks https://github.com/search?q="list+of+conditions+and+the+following+refusal+of+responsibility."&type=code ;)

In the guac context, you should IMHO always track whatever is the asserted license you receive as-is. And if you are lucky, this is correct. It is going to be inaccurate and creative in some (or many) cases as many (or most) SBOM tools do something between a poor and a bad job to report licensing.

I'd suggest that you can enrich this after the fact with correct data from a ScanCode scans (either directly, recommended) or retrieved from PurlDB or raw from ClearlyDefined.

@pombredanne
Copy link
Contributor

pombredanne commented Sep 27, 2024

And as for getting things from ClearlyDefined, I'd focus on the raw "harvest" as we do here https://github.com/aboutcode-org/purldb/blob/main/clearcode/

@lumjjb
Copy link
Contributor

lumjjb commented Oct 28, 2024

Fixed by #2164

@lumjjb lumjjb closed this as completed Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants