feat: introduce opossum files to CLI #173

Hellgartner · 2025-01-13T10:27:01Z

Summary of changes

Read .opossum files with the library.
Currently this is mainly a pass-through that should result in the same file again on the output
See #49

Context and reason for change

While the pass-through is at the moment mainly a validation step, this is an important precondition to be able to use the planned merge functionality later

How can the changes be tested

Get any opossum file with only an input.json containd. Process it and observe no changes in the output. For a simple file this should also be part of the ci

* create interface with basic test * create empty function shell for conversion

* Currently only contains the things where we do have structure for already in our Opossum model * This is not fully compliant to the full opossum model yet --> will be enhanced

SideNote: This required switching Metadata to a BaseModel (instead of using the annotation) in order to successfully use the extra="allow" option which does not work in the annotation version

* Update the OpossumInformation datatype accordingly

* Cleanup tests * Introduce a few core constants

src/opossum_lib/opossum/opossum_file.py

tests/test_cli.py

* Use type annotations more consistently * Mark internal functions as private * Remove unnecessary cast

* Shorten the assertion function * Use it also for SPDX files

mstykow · 2025-01-14T15:22:46Z

src/opossum_lib/cli.py

@@ -27,6 +29,12 @@ def opossum_file() -> None:
    multiple=True,
    type=click.Path(exists=True),
 )
+@click.option(
+    "--opossum",
+    help="opossum files used as input.",


since you're using plural, it sounds like this is a list of files but in reality you mean that each flag of --opossum has exactly one argument. so to improve clarity, i suggest you say something like this: "Specify a path to a .opossum file that you would like to include in the final output. Option can be repeated."

also note, that i'm including "path" in the help message to make it clear that this must be a valid file path.

mstykow · 2025-01-14T15:25:02Z

src/opossum_lib/cli.py

@@ -62,5 +63,26 @@ def generate(spdx: list[str], outfile: str) -> None:
    write_opossum_information_to_file(opossum_information, Path(outfile))


+def validate_input_exit_on_error(spdx: list[str], opossum: list[str]) -> None:


maybe clearer: "validate_input_and_exit_on_error". at first i read this as "validate input-exit on error".

mstykow · 2025-01-14T15:28:30Z

src/opossum_lib/opossum/file_generation.py

-        return element
+        z.writestr(
+            INPUT_JSON_NAME,
+            TypeAdapter(OpossumInformation).dump_json(


nice improvement 👍

mstykow · 2025-01-14T15:33:09Z

src/opossum_lib/opossum/opossum_file.py

+        # hack to override not serializing keys with corresponding value none:
+        # In this case this is valid and should be part of the serialization


above, you're using exclude_none=True so is it really a surprise that none values are not serialized?

mstykow · 2025-01-14T15:34:01Z

src/opossum_lib/opossum/opossum_file.py

    name: str
-    documentConfidence: int | None = 0
+    documentConfidence: int | float | None = 0
+    additionalName: str | None = None


not the first time, but i'm surprised to see camel case here. in python it's usually standard to use snake case. is there some special need for camel here?
if it's because of the serializing to JSON, pydantic has a configuration option to convert camel to snake during serialization if i remember correctly.

True about the standard: https://peps.python.org/pep-0008/#method-names-and-instance-variables

Added this also to Make all opossum file models inherit from BaseModel #178 (comment) where anyhow the model is reworked

mstykow · 2025-01-14T15:41:15Z

src/opossum_lib/opossum/opossum_file.py

-ResourcePath = str
+type OpossumPackageIdentifier = str
+type ResourcePath = str
+type ResourceInFile = dict[str, ResourceInFile] | int


in which file? perhaps better: "OpossumFileResource"?

What about:

The "Resource" class anyhow does not describe the file model --> Remove it from the file

Then just rename it to Resource to match the model (and maybe rename to old Resource for easier distinguishing)

A good time to change this is probably when tackling #178
I think we could move the current Resource type fully to the spdx section because:

the opossum code now uses the (to be renamed) ResourceInFile

the scancode code uses its own tree data structure and used to convert to Resource but this is historic and could be changed very easily (in fact I'll just do it now)

We then could rename ResourceInFile just to Resources or OpossumResources to match the top-level key and perhaps make it a full pydantic.Model with some convenience functions for construction. That has the advantage that there is a single point for the logic of how these resources are structured and changing it (e.g. because #38) would be easy. The small downside is that we need to hook into the serialization of pydantic which shouldn't be hard.

mstykow · 2025-01-14T15:41:48Z

src/opossum_lib/opossum/read_opossum_file.py

+                input_json = json.load(input_json_file)
+                return TypeAdapter(OpossumInformation).validate_python(input_json)
+    except Exception as e:
+        # handle the exception


what value does this comment add?

None ... probably a reminder from pressing autocomplete and not taking care appropriately

mstykow · 2025-01-14T15:42:43Z

src/opossum_lib/opossum/read_opossum_file.py

+    if OUTPUT_JSON_NAME in input_zip_file.namelist():
+        logging.error(
+            f"Opossum file {input_zip_file.filename} also contains"
+            f" '{OUTPUT_JSON_NAME}' which cannot be processed"
+        )
+        sys.exit(1)


why do we care? can we just ignore the output file?

I still would argue that this would surprise the user.
There is a follow up issue to add a force modifier which implements that behavior
#177

Hellgartner added 2 commits January 13, 2025 10:49

feat: introduce opossum files to CLI

a8ebc17

* create interface with basic test * create empty function shell for conversion

feat: read file and validate opossum file structure

119dabd

Hellgartner force-pushed the feat-opossum-file-parsing branch from b9c9711 to 119dabd Compare January 13, 2025 11:46

Hellgartner added 7 commits January 14, 2025 10:07

feat: read file and validate opossum file structure -- first version

a0d714b

* Currently only contains the things where we do have structure for already in our Opossum model * This is not fully compliant to the full opossum model yet --> will be enhanced

refactor: Use pydantic default code for json serialization

f26f1e2

feat: Allow for extra metadata objects

ae67712

SideNote: This required switching Metadata to a BaseModel (instead of using the annotation) in order to successfully use the extra="allow" option which does not work in the annotation version

feat: Include most frequent licenses

4e8bcd8

* Update the OpossumInformation datatype accordingly

feat: Include remaining opossum file properties

339336c

refactor: cleanup

73f691a

* Cleanup tests * Introduce a few core constants

chore: fix windows tests

e1a0e65

Hellgartner force-pushed the feat-opossum-file-parsing branch from 22ec26a to e1a0e65 Compare January 14, 2025 12:41

Hellgartner marked this pull request as ready for review January 14, 2025 12:42

abraemer reviewed Jan 14, 2025

View reviewed changes

Hellgartner added 2 commits January 14, 2025 15:59

refactor: pull request review comments

0b005c2

* Use type annotations more consistently * Mark internal functions as private * Remove unnecessary cast

feat: improve tests

ec9650d

* Shorten the assertion function * Use it also for SPDX files

abraemer approved these changes Jan 14, 2025

View reviewed changes

Hellgartner merged commit e909288 into main Jan 14, 2025
7 checks passed

Hellgartner deleted the feat-opossum-file-parsing branch January 14, 2025 15:22

mstykow reviewed Jan 14, 2025

View reviewed changes

Hellgartner mentioned this pull request Jan 14, 2025

opossum file parsing #49

Closed

mstykow reviewed Jan 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: introduce opossum files to CLI #173

feat: introduce opossum files to CLI #173

Hellgartner commented Jan 13, 2025

mstykow Jan 14, 2025 •

edited

Loading

mstykow Jan 14, 2025

mstykow Jan 14, 2025

mstykow Jan 14, 2025

mstykow Jan 14, 2025

Hellgartner Jan 15, 2025

mstykow Jan 14, 2025

Hellgartner Jan 15, 2025

abraemer Jan 15, 2025 •

edited

Loading

mstykow Jan 14, 2025

Hellgartner Jan 15, 2025

mstykow Jan 14, 2025

Hellgartner Jan 15, 2025

		@@ -62,5 +63,26 @@ def generate(spdx: list[str], outfile: str) -> None:
		write_opossum_information_to_file(opossum_information, Path(outfile))


		def validate_input_exit_on_error(spdx: list[str], opossum: list[str]) -> None:

		# hack to override not serializing keys with corresponding value none:
		# In this case this is valid and should be part of the serialization

feat: introduce opossum files to CLI #173

feat: introduce opossum files to CLI #173

Conversation

Hellgartner commented Jan 13, 2025

Summary of changes

Context and reason for change

How can the changes be tested

mstykow Jan 14, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abraemer Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mstykow Jan 14, 2025 •

edited

Loading

abraemer Jan 15, 2025 •

edited

Loading