Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introduce opossum files to CLI #173

Merged
merged 11 commits into from
Jan 14, 2025
Merged

Conversation

Hellgartner
Copy link
Contributor

Summary of changes

Read .opossum files with the library.
Currently this is mainly a pass-through that should result in the same file again on the output
See #49

Context and reason for change

While the pass-through is at the moment mainly a validation step, this is an important precondition to be able to use the planned merge functionality later

How can the changes be tested

Get any opossum file with only an input.json containd. Process it and observe no changes in the output. For a simple file this should also be part of the ci

* create interface with basic test
* create empty function shell for conversion
@Hellgartner Hellgartner force-pushed the feat-opossum-file-parsing branch from b9c9711 to 119dabd Compare January 13, 2025 11:46
* Currently only contains the things where we do have
structure for already in our Opossum model
* This is not fully compliant to the full opossum model yet
--> will be enhanced
SideNote: This required switching Metadata to a BaseModel
(instead of using the annotation) in order to successfully
use the extra="allow" option which does not work in the
annotation version
* Update the OpossumInformation datatype accordingly
* Cleanup tests
* Introduce a few core constants
@Hellgartner Hellgartner force-pushed the feat-opossum-file-parsing branch from 22ec26a to e1a0e65 Compare January 14, 2025 12:41
@Hellgartner Hellgartner marked this pull request as ready for review January 14, 2025 12:42
src/opossum_lib/opossum/opossum_file.py Show resolved Hide resolved
src/opossum_lib/opossum/opossum_file.py Outdated Show resolved Hide resolved
src/opossum_lib/opossum/opossum_file.py Outdated Show resolved Hide resolved
src/opossum_lib/opossum/opossum_file.py Outdated Show resolved Hide resolved
tests/test_cli.py Show resolved Hide resolved
* Use type annotations more consistently
* Mark internal functions as private
* Remove unnecessary cast
* Shorten the assertion function
* Use it also for SPDX files
@Hellgartner Hellgartner merged commit e909288 into main Jan 14, 2025
7 checks passed
@Hellgartner Hellgartner deleted the feat-opossum-file-parsing branch January 14, 2025 15:22
@@ -27,6 +29,12 @@ def opossum_file() -> None:
multiple=True,
type=click.Path(exists=True),
)
@click.option(
"--opossum",
help="opossum files used as input.",
Copy link
Member

@mstykow mstykow Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you're using plural, it sounds like this is a list of files but in reality you mean that each flag of --opossum has exactly one argument. so to improve clarity, i suggest you say something like this: "Specify a path to a .opossum file that you would like to include in the final output. Option can be repeated."

also note, that i'm including "path" in the help message to make it clear that this must be a valid file path.

@Hellgartner Hellgartner mentioned this pull request Jan 14, 2025
@@ -62,5 +63,26 @@ def generate(spdx: list[str], outfile: str) -> None:
write_opossum_information_to_file(opossum_information, Path(outfile))


def validate_input_exit_on_error(spdx: list[str], opossum: list[str]) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe clearer: "validate_input_and_exit_on_error". at first i read this as "validate input-exit on error".

return element
z.writestr(
INPUT_JSON_NAME,
TypeAdapter(OpossumInformation).dump_json(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice improvement 👍

Comment on lines +37 to +38
# hack to override not serializing keys with corresponding value none:
# In this case this is valid and should be part of the serialization
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

above, you're using exclude_none=True so is it really a surprise that none values are not serialized?

Comment on lines 52 to +54
name: str
documentConfidence: int | None = 0
documentConfidence: int | float | None = 0
additionalName: str | None = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not the first time, but i'm surprised to see camel case here. in python it's usually standard to use snake case. is there some special need for camel here?
if it's because of the serializing to JSON, pydantic has a configuration option to convert camel to snake during serialization if i remember correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResourcePath = str
type OpossumPackageIdentifier = str
type ResourcePath = str
type ResourceInFile = dict[str, ResourceInFile] | int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in which file? perhaps better: "OpossumFileResource"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about:

  • The "Resource" class anyhow does not describe the file model --> Remove it from the file
  • Then just rename it to Resource to match the model (and maybe rename to old Resource for easier distinguishing)

Copy link
Contributor

@abraemer abraemer Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good time to change this is probably when tackling #178
I think we could move the current Resource type fully to the spdx section because:

  • the opossum code now uses the (to be renamed) ResourceInFile
  • the scancode code uses its own tree data structure and used to convert to Resource but this is historic and could be changed very easily (in fact I'll just do it now)

We then could rename ResourceInFile just to Resources or OpossumResources to match the top-level key and perhaps make it a full pydantic.Model with some convenience functions for construction. That has the advantage that there is a single point for the logic of how these resources are structured and changing it (e.g. because #38) would be easy. The small downside is that we need to hook into the serialization of pydantic which shouldn't be hard.

input_json = json.load(input_json_file)
return TypeAdapter(OpossumInformation).validate_python(input_json)
except Exception as e:
# handle the exception
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what value does this comment add?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None ... probably a reminder from pressing autocomplete and not taking care appropriately

Comment on lines +41 to +46
if OUTPUT_JSON_NAME in input_zip_file.namelist():
logging.error(
f"Opossum file {input_zip_file.filename} also contains"
f" '{OUTPUT_JSON_NAME}' which cannot be processed"
)
sys.exit(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we care? can we just ignore the output file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still would argue that this would surprise the user.
There is a follow up issue to add a force modifier which implements that behavior
#177

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants