Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: [vcpkg-downloads] Helper script to parse download.json #38656

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Neumann-A
Copy link
Contributor

No description provided.

Comment on lines 3 to 12
{
"output_variable" : "extracted_src_github",
"from": "github",
"repository" : "facebook/zstd",
"ref": "v1.5.6",
"sha512" : "ca12dffd86618ca008e1ecc79056c1129cb4e61668bf13a3cd5b2fa5c93bc9c92c80f64c1870c68b9c20009d9b3a834eac70db72242d5106125a1c53cccf8de8",
"head_ref": "dev",
"patches" : [],
"target_dir" : "${other_src_dir}/zstd"
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also be:

Suggested change
{
"output_variable" : "extracted_src_github",
"from": "github",
"repository" : "facebook/zstd",
"ref": "v1.5.6",
"sha512" : "ca12dffd86618ca008e1ecc79056c1129cb4e61668bf13a3cd5b2fa5c93bc9c92c80f64c1870c68b9c20009d9b3a834eac70db72242d5106125a1c53cccf8de8",
"head_ref": "dev",
"patches" : [],
"target_dir" : "${other_src_dir}/zstd"
},
"extracted_src_github" : {
"from": "github",
"repository" : "facebook/zstd",
"ref": "v1.5.6",
"sha512" : "ca12dffd86618ca008e1ecc79056c1129cb4e61668bf13a3cd5b2fa5c93bc9c92c80f64c1870c68b9c20009d9b3a834eac70db72242d5106125a1c53cccf8de8",
"head_ref": "dev",
"patches" : [],
"target_dir" : "${other_src_dir}/zstd"
},

Basically dropping the output_variable field

Copy link
Contributor

@Thomas1664 Thomas1664 May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer "github": "output-variable": "...", ...} because otherwise the set of allowed fields would depend on the value of from which is way harder to parse compared to this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't quite understand this comment: I still need from here and the fields always depend on where it comes from. (doesn't matter if that is depending on from or some other mechanism )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You suggested here to make the output variable the key of the object instead of having an array of keyless objects. I propose to make the value of from the key:

"github" : {
      "output-variable": "extracted_src_github",
      "repository" : "facebook/zstd",
      "ref": "v1.5.6",
      "sha512" : "ca12dffd86618ca008e1ecc79056c1129cb4e61668bf13a3cd5b2fa5c93bc9c92c80f64c1870c68b9c20009d9b3a834eac70db72242d5106125a1c53cccf8de8",
      "head_ref": "dev",
      "patches" : [],
      "target_dir" : "${other_src_dir}/zstd"
    }

This way you can decide more easily the download method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main benefit for that is schema validation, however you might end up with something like:

[
 "github" : { .... },
 "github" : { .... },
 "github" : { .... },
 "github" : { .... },
 "gitlab" : { .... },
 "git" : { .... },
]

Leading to something.0.github.--- something.1.github.--- something.2.github.---. The is no meaning in that selection if there is duplicates.

While:

[
 "src1" : { .... },
 "src2" : { .... },
 "src3" : { .... },
 "src4" : { .... },
]

Actually has meaning since srcX has to be an unique identifier since otherwise outputs get overwritten. I also see the json keys as a right to left assignment in this case (data on the right creates the key on the left).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leading to something.0.github.--- something.1.github.--- something.2.github.---. The is no meaning in that selection if there is duplicates.

Why is this a problem? To my understanding, JSON arrays are allowed to have multiple elements (objects) with the same key, e.g. "{"key": "value", "key": "value"} is not allowed whereas ["key": {"subkey": "value"}, "key": {"subkey": "value"}] is allowed.

data on the right creates the key on the left

The output variable itself is data. According to this, it should be on the right whereas github or gitlab are less data-ish. I think JSON should go from general to specific. The output variable is a lot more specific than github or gitlab.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

["key": {"subkey": "value"}, "key": {"subkey": "value"}]

This is no valid JSON...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would need to be [ { "key": {"subkey": "value"} }, { "key": {"subkey": "value"} } ]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the output_variable as key variant, because you prevent collisions by design with that.

@Osyotr
Copy link
Contributor

Osyotr commented May 9, 2024

Sometimes there are per-platform and per-feature downloads, so "platform" and "features"expressions are needed.

@Thomas1664
Copy link
Contributor

I think the goal of this PR is to fix SBOM generation as suggested by @ras0219-msft here. Because that would require parsing download.json from the tool anyway, I don't see the point in parsing it from cmake just to call back into the tool to actually perform the download. Furthermore, parsing download.json from cmake would lock us in to always parse it from cmake because the tool doesn't have a way to detect weither it should start the download or cmake calls back into the tool to start the download.

That being said, I'd rather want to see vcpkg_from_* to generate download.json, put it in ${CURRENT_BUILD_DIR}, and let the tool parse it.

@WangWeiLin-MV WangWeiLin-MV added category:infrastructure Pertaining to the CI/Testing infrastrucutre category:new-port The issue is requesting a new library to be added; consider making a PR! labels May 9, 2024
@Neumann-A
Copy link
Contributor Author

Sometimes there are per-platform and per-feature downloads, so "platform" and "features"expressions are needed.

I can add an if field for that

@Neumann-A
Copy link
Contributor Author

I think the goal of this PR is to fix SBOM generation as suggested by

No the goal is to make updating ports easier. Updating a JSON is simply than trying to find the correct lines in a cmake file.
If it has other benefits that is even better ;)

I don't see the point in parsing it from cmake just to call back into the tool

It is already implemented in cmake that is the point ;). Feel free to port the implementation to the tool. I rather start getting implementation experience in CMake than to bake it directly into the tool on the first try.

would lock us in to always parse it from cmake because the tool doesn't have a way to detect weither it should start the download or cmake calls back into the tool to start the download.

That is incorrect. Why? Because vcpkg_from_* functions already do it. You could apply a similar approach for vcpkg_download_from_json and just inform vcpkg about the current state of the required CMake variables and return the required output cmake variables back to cmake.

That being said, I'd rather want to see vcpkg_from_* to generate download.json, put it in ${CURRENT_BUILD_DIR}, and let the tool parse it.

That only works in a sequentiell manner doesn't it?

@Thomas1664
Copy link
Contributor

It is already implemented in cmake that is the point

This PR would break the current SBOM file generation logic in the tool. Therefore, you have to parse download.json from within the tool. At this point, you could also start the download directly from within the tool rather than calling back to it from cmake.

That only works in a sequentiell manner doesn't it?

What exactly do you mean by sequential? Everything in cmake is sequential.

@Neumann-A
Copy link
Contributor Author

This PR would break the current SBOM file generation logic in the tool. Therefore, you have to parse download.json from within the tool.

If this PR breaks that generation logic than that generation logic is more than incomplete. What exactly is broken here?

What exactly do you mean by sequential? Everything in cmake is sequential.

Yes in cmake everything is sequential, however having a download.json with all downloads instead of generating one for every vcpkg_from call enables parallel downloads and extractions by a later tool implementation without cmake noticing it at all.

Currently visible in portfile [stuff in () is hidden to the user]:

vcpkg -> cmake -> vcpkg_from_x (-> vcpkg (downloads)) -> cmake -> vcpkg_from_x (-> vcpkg (downloads)) -> ....

This PR:

vcpkg -> cmake -> vcpkg_download_from_json (-> cmake -> vcpkg (downloads) -> cmake -> vcpkg (downloads) -> cmake -> vcpkg (downloads) ) -> ....

So this PR ideally batches all downloads to a single function call (although portfiles are still able to break that batching up). Having everything batched up allows a later tool implementation to gather all needed information beforehand and perform the download and extraction step in parallel instead of having to wait for CMake to reach the next download part.

So that the workflow is then:
vcpkg -> cmake -> vcpkg_download_from_json (-> cmake -> vcpkg (downloads batched) ) -> ....

Again the main aim here is to improve automatic updates of ports.

Neumann-A added 2 commits May 9, 2024 20:29
Rename target-dir to copy-to
@Thomas1664
Copy link
Contributor

If this PR breaks that generation logic than that generation logic is more than incomplete. What exactly is broken here?

It parses cmake for vcpkg_from_* and then extracts all relevant information from there. The fix would be to directly parse the JSON file.

@Neumann-A
Copy link
Contributor Author

It parses cmake for vcpkg_from_* and then extracts all relevant information from there. The fix would be to directly parse the JSON file.

Ok so broken by design. I think autoantwort already had a better approach in a different PR. You run cmake in trace mode and analyze the trace afterwards.... not trying to apply a heuristic before. Anything else is broken by design.

@dg0yt
Copy link
Contributor

dg0yt commented May 9, 2024

You run cmake in trace mode and analyze the trace afterwards....

Or you make the functions write a machine-readable log.
However, both approaches are dynamic, requiring running CMake to execute the scripts, and actually to access the network.
download.json could be processed statically, even offline.

@autoantwort
Copy link
Contributor

If this gets integrated in the tool at some point, is there a preference in integrating that in the vcpkg.json file?
Then you would be able to simply attach downloads to features etc and don't need a "new" if syntax.

@Neumann-A
Copy link
Contributor Author

is there a preference in integrating that in the vcpkg.json file?

I would be against it since it would make the manifest even more overloaded.

attach downloads to features etc and don't need a "new" if syntax.

The alternative would be to add platform and feature fields to download.json. Main issue with platform is that that is not understandable in CMake unless someone defines the mapping. (linking features would be rather trivial). Although after thinking about it, it might be as easy as transforming each element to uppercase and put VCPKG_TARGET_IS_ before it (might be able to transform that with a regex).

So the last remaining CMake entries would be ${VERSION} and components of version to calculate filenames.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:infrastructure Pertaining to the CI/Testing infrastrucutre category:new-port The issue is requesting a new library to be added; consider making a PR!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants