-
Notifications
You must be signed in to change notification settings - Fork 2.6k
fix(vendor)!: direct extraction for registry sources #15514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Except this doesn't guarantee things are pristine until we vendor the contents of the |
The chance of this happening is low because This patch aims to be as simple as possible. The “direct extraction” is the correct fix but require more architectural changes. |
And that we delete the extracted So looking over this again, this does not fix #15090. This PR only changes which files we list, it doesn't change how we copy them which is what is causing the problem in #15090.
Is it all that big of an architectural change? Can we get an instance of the registry source and and call an extract method directly on it? |
How does it not fix #15090? cargo-vendor doesn't change any content of a registry package, at least
Maybe not really an architectural change, but we'll need to have an abstraction over local/remote/http registry (which is RegistryData/RegistrySource). Ensure |
`PathSource::list_files` has some heurstic rules for listing files. Those rules are mainly designed for `cargo package`. Previously, cargo-vendor relies on those rules to understand what files to vendor. However, it shouldn't use those rules because: * Package extracted from a `.crate` tarball isn't Git-controlled, some rules may apply differently. * The extracted package already went through `PathSource::list_files` during packaging. It should be clean enough. * Should keep crate sources from registry sources in a pristine state, which is exactly what vendoring is meant for. Instead, we switch to direct extraction into the vendor directory to ensure source code is the same as in the `.crate` tarball. There are some caveats: * The overwrite protection in `unpack_package` assumes the unpack directory is always `<pkg>-<version`>. We don't want to remove this, but for cargo-vendor supports vendoring without version suffix. For that case, we need a temporary staging area, and move the unpacked source then. * The heurstic in `PathSource::list_files` did something "good" in general cases, like excluding hidden directories. That means common directorys like `.github` or `.config` won't be vendored. After this, those get included. This is another round of churns. We might want to get other `cargo-vendor` changes along with this in one single release.
By doing direct extractions, source code is kept in its original state The old workaround is irrelevant now.
c6773a8
to
a89f513
Compare
Not sure if it is really needed, though Cargo had better follow what platform support says. > The behavior on Windows is the same on Windows 10 1607 and higher > if `FileRenameInfoEx` is supported by the filesystem; otherwise, > `from` can be anything, but `to` must *not* be a directory. https://doc.rust-lang.org/1.86.0/src/std/fs.rs.html#2430-2435
a89f513
to
b09d3de
Compare
Anyway, I got direct extraction working. Doesn't look too terrible than I thought! |
@rustbot note benchmark-result Also, the benchmark result shows that it is ~60% faster when vendoring cargo and its dependencies. On Apple M1 Pro with 500GiB APFS SSD
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
I have not gone over this with a fine-toothed comb, but the overall structure looks good to me.
I'm not going to click merge right now in case anyone else wants to look closer, or you had any additional concerns or changes you were thinking of. If there isn't more progress by sometime next week, then I think it would be fine to just merge.
What does this PR try to resolve?
PathSource::list_files
has some heurstic rules for listing files. Those rules are mainly designed forcargo package
.Previously, cargo-vendor relies on those rules to understand what files to vendor. However, it shouldn't use those rules because:
.crate
tarball isn't Git-controlled, some rules may apply differently.PathSource::list_files
during packaging. It should be clean enough.Instead, we switch to direct extraction into the vendor directory
to ensure source code is the same as in the
.crate
tarball.How should we test and review this PR?
There is already a test
vendor::package_exclude
for the fix of #9054,though now I think it is not a good fix. The test change shows the correct behavior change.
I am not sure if we want more tests.
Also, there are some caveats with this fix:
unpack_package
assumes the unpackdirectory is always
<pkg>-<version
>.We don't want to remove this,
but for cargo-vendor supports vendoring without version suffix.
For that case, we need a temporary staging area,
and move the unpacked source then.
PathSource::list_files
did something "good" ingeneral cases, like excluding hidden directories. That means
common directories like
.github
or.config
won't be vendored.After this, those get included. This is another round of churns.
We might want to get other
cargo-vendor
changes along with thisin one single release.
Additional information
cargo vendor
vscargo package
inconsistency #9054cargo vendor
does not include license-file if not in package list #9575cargo vendor
doesn't generate.cargo_vcs_info.json
#11000cargo vendor
misses some files #14034cargo vendor
loses some nonempty directories #15080cargo vendor
should vendor crates in a pristine state #15090This also opens a door for
Cargo.toml
can break crates with deps. that readCargo.toml
#13191Since we are changing vendored source (again), we might want to remove the
.rej
/.orig
special rules in cargo-vendor, as well as look into the new source-dedup vendor dir layout.Summary Notes
Generated by triagebot, see help for how to add more