Skip to content

package manager: filter unpack errors on paths excluded by manifest #19500

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Apr 9, 2024

Conversation

ianic
Copy link
Contributor

@ianic ianic commented Mar 31, 2024

Fixes #18089, checks the remaining 2 items, and also fixes #17460.

The first item was to create diagnostic errors in git unpack, instead of overwriting files with the same name on case insensitive file system. Implemented here.

The second one was filtering tar/git unpack errors if they are on path not included by manifest (build.zig.zon). For that, I created UnpackResult which collects errors from tar\git diagnostic, allows filtering them by included paths and emits remaining errors to an ErrorBundle.

Added tar tests where I create a tarball containing two files with the same name, simulating case insensitive file system, and then call Fetch.run with and without manifest which excludes those files. Can be run with zig test src/Package.zig.

I used Ian's collection of pathological packages and his nice trick to simulate case insensitive, no symlink file system by mounting FAT32 file. I made a test around that. It depends on access to github repository, and local FAT32 mount, but it covers a lot of cases both for tar and git packages. Should I remove it because of all these dependencies? (it will be skipped if FAT32 file system is not found).

I also tested on Windows and macOS with different packages and the results are the same as described in previous try.

@jacobly0 jacobly0 changed the title pacakge manager: filter unpack errors on paths excluded by manifest package manager: filter unpack errors on paths excluded by manifest Apr 1, 2024
@andrewrk
Copy link
Member

andrewrk commented Apr 2, 2024

Thanks for doing this other approach. I think it's a good sign that @ianprime0509's feedback from #19324 no longer applies.

I used Ian's collection of pathological packages and his nice trick to simulate case insensitive, no symlink file system by mounting FAT32 file. I made a test around that. It depends on access to github repository, and local FAT32 mount, but it covers a lot of cases both for tar and git packages. Should I remove it because of all these dependencies? (it will be skipped if FAT32 file system is not found).

Nice work creating these test cases. The dependencies are indeed a problem for upstreaming them. I think it would be best to maintain this test suite separately. I have done something similar with https://github.com/ziglang/contrib-testing - perhaps that could be a place for these extra tests to live?

@ianic ianic force-pushed the package_filter_errors branch from 9a8f006 to c22ff8f Compare April 3, 2024 11:43
@ianic
Copy link
Contributor Author

ianic commented Apr 3, 2024

I removed that complicated test from Fetch.zig, moving it to the zig-fetch-test.
There I symlinked zig repo so I can run that test on each change in the source without building zig binary. So I still get fast response during development.
There I also compare package manager output of locally build zig binary and release build on a few real word packages.

@ianic ianic force-pushed the package_filter_errors branch from c22ff8f to 262f995 Compare April 4, 2024 19:55
@ianic
Copy link
Contributor Author

ianic commented Apr 4, 2024

Rebasing to the latest changes in package manager.
Add unit test to cover this PR and the previous one #19111.
Update 'integration tests' project.

unpackResource now returns UnpackResult structure ( root_dir is now part of that structure) instead of cryptic ?[]const u8.

ianic added a commit to ianic/zig that referenced this pull request Apr 5, 2024
Based on file content. Detects elf magic header or shebang line.

Executable bit is ignored in hash calculation, as it was before this. So
packages hashes are not changed.

Reference:
ziglang#17463 (comment)

Fixes: 17463

Test is here:
https://github.com/ianic/zig-fetch-test/blob/7c4600d7bb263f9b72fe3d0b70071f42be89e25c/src/main.zig#L307
(if ziglang#19500 got accepted I'll move this test to the Fetch.zig)
andrewrk pushed a commit that referenced this pull request Apr 6, 2024
Based on file content. Detects elf magic header or shebang line.

Executable bit is ignored in hash calculation, as it was before this. So
packages hashes are not changed.

Reference:
#17463 (comment)

Fixes: 17463

Test is here:
https://github.com/ianic/zig-fetch-test/blob/7c4600d7bb263f9b72fe3d0b70071f42be89e25c/src/main.zig#L307
(if #19500 got accepted I'll move this test to the Fetch.zig)
@ianic ianic force-pushed the package_filter_errors branch from 262f995 to b7d1fea Compare April 6, 2024 23:29
@ianic
Copy link
Contributor Author

ianic commented Apr 8, 2024

To summarize changes in this PR:

lib/std/tar.zig

Type of root_dir in Diagnostic changed from nullable to non-nullable with default empty string value.
The idea was to avoid null checking on each place where this is used and make consistent with Cache.Path.sub_path.
Change is covered by test.

src/Package/Fetch/git.zig

Add unable_to_create_file to the diagnostic, like it is in tar diagnostic.

src/Fetch.zig

unpackTarball and unpackGitPack returns now UnpackResult. They fill UnpackResult with diagnostic errors instead of raising them. Those errors are then filtered for excluded paths in runResource when we have manifest. If there are remaining errors after filtering they are then raised with UnpackResult.bundleErrors.

The rest of changes in Fetch.zig are added UnpackResult structure and tests.

Test case files are in Fetch/testdata folder. Test embeds one test file, creates temporary directory, saves embedded file to the temporary directory, initializes Fetch with TestFetchBuilder, runs Fetch on the saved file (cache directory is also positioned in the temporary directory), analyzes cache directory output or fetch errors.

src/Package.zig

Expose Fetch test so we can test it with zig test src/Package.zig. Fetch imports "../Package.zig" making it not testable with zig test Fetch.zig.

Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do my suggestions make sense?

const pkg_dir = try unpackResource(f, resource, uri_path, tmp_directory);
// Fetch and unpack a resource into a temporary directory.
var unpack_result = try unpackResource(f, resource, uri_path, tmp_directory);
defer unpack_result.deinit();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the arena allocator for this data so that this deinit() is not needed. The general-purpose allocator causes contention between threads, whereas each thread has its own arena allocator. It also simplifies memory management to use the arena allocator.

Comment on lines 490 to 494
try unpack_result.filterErrors(filter);
if (unpack_result.hasErrors()) {
try unpack_result.bundleErrors(eb, try f.srcLoc(f.location_tok));
return error.FetchFailed;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
try unpack_result.filterErrors(filter);
if (unpack_result.hasErrors()) {
try unpack_result.bundleErrors(eb, try f.srcLoc(f.location_tok));
return error.FetchFailed;
}
try unpack_result.validate(eb, f.location_tok, filter);

Request to combine all that logic into a validate function.

Comment on lines 1744 to 1745
allocator: std.mem.Allocator,
errors: std.ArrayListUnmanaged(Error) = .{},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
allocator: std.mem.Allocator,
errors: std.ArrayListUnmanaged(Error) = .{},
errors: []Error,

Comment on lines 1776 to 1789
fn free(self: Error, allocator: std.mem.Allocator) void {
switch (self) {
.unable_to_create_sym_link => |info| {
allocator.free(info.file_name);
allocator.free(info.link_name);
},
.unable_to_create_file => |info| {
allocator.free(info.file_name);
},
.unsupported_file_type => |info| {
allocator.free(info.file_name);
},
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn free(self: Error, allocator: std.mem.Allocator) void {
switch (self) {
.unable_to_create_sym_link => |info| {
allocator.free(info.file_name);
allocator.free(info.link_name);
},
.unable_to_create_file => |info| {
allocator.free(info.file_name);
},
.unsupported_file_type => |info| {
allocator.free(info.file_name);
},
}
}

Comment on lines 1796 to 1771
fn deinit(self: *UnpackResult) void {
for (self.errors.items) |item| {
item.free(self.allocator);
}
self.errors.deinit(self.allocator);
self.allocator.free(self.root_error_message);
self.allocator.free(self.root_dir);
self.* = undefined;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn deinit(self: *UnpackResult) void {
for (self.errors.items) |item| {
item.free(self.allocator);
}
self.errors.deinit(self.allocator);
self.allocator.free(self.root_error_message);
self.allocator.free(self.root_dir);
self.* = undefined;
}

ianic added 18 commits April 9, 2024 15:00
Test that UnpackResult prints same error output as diagnostic.
Test that we are still outputing same errors.
Report only errors which are not filtered by paths in build.zig.zon.
On case insensitive file systems, don't overwrite files with same name
in different casing. Add diagnostic error so caller could decide what to do.
Using test cases from:
https://github.com/ianprime0509/pathological-packages repository.
Depends on existence of the FAT32 file system. Folder is in FAT32 file
system because it is case insensitive and and does not support symlinks.

It is complicated test case requires internet connection, depends on
existence of FAT32 in the specific location. But it is so valuable for
development. Running `zig test Package.zig` is so much faster than
building zig binary and running `zig fetch URL`. Committing it here
although it should probably be removed.
Should include folder structure, at least root folder so it can be found
in pipeToFileSystem.
To be consistent with paths in manifest.
Make it consistent with Cache.Path sub_path.
Remove null check in many locations.
Prepare test cases, store them in Fetch/testdata.
They cover changes in this PR as well from previous one ziglang#19111.
Use stripRoot in less places. Strip it while copying error from
diagnostic to unpack result so other palaces can be free of this logic.
Reference:
ziglang#19500 (comment)

Arena is now used for Diagnostic (tar and git). `deinit` is not called on Diagnostic
allowing us to reference strings from Diagnostic in UnpackResult without
dupe.

That seamed reasonable to me. Instead of using gpa for Diagnostic, and
then dupe to arena. Or using arena for both and making dupe so we can deinit
Diagnostic.
@ianic ianic force-pushed the package_filter_errors branch from b7d1fea to 4151e6c Compare April 9, 2024 13:00
@ianic
Copy link
Contributor Author

ianic commented Apr 9, 2024

I switch to arena both UnpackResult and Diagnostic (tar and git). deinit is not called on Diagnostic
allowing us to reference strings from Diagnostic in UnpackResult without dupe.

That seemed reasonable to me. Instead of using gpa for Diagnostic, and then dupe to arena. Or using arena for both and making dupe so we can deinit Diagnostic.

@ianic ianic requested a review from andrewrk April 9, 2024 19:02
Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Thanks for all the follow-ups.

Comment on lines +1802 to +1866
fn validate(self: *UnpackResult, f: *Fetch, filter: Filter) !void {
self.filterErrors(filter);
if (self.hasErrors()) {
const eb = &f.error_bundle;
try self.bundleErrors(eb, try f.srcLoc(f.location_tok));
return error.FetchFailed;
}
}

// Filter errors by manifest inclusion rules.
fn filterErrors(self: *UnpackResult, filter: Filter) void {
var i = self.errors_count;
while (i > 0) {
i -= 1;
if (self.errors[i].excluded(filter)) {
self.errors_count -= 1;
const tmp = self.errors[i];
self.errors[i] = self.errors[self.errors_count];
self.errors[self.errors_count] = tmp;
}
}
}

// Emmit errors to an `ErrorBundle`.
fn bundleErrors(
self: *UnpackResult,
eb: *ErrorBundle.Wip,
src_loc: ErrorBundle.SourceLocationIndex,
) !void {
if (self.errors_count == 0 and self.root_error_message.len == 0)
return;

const notes_len: u32 = @intCast(self.errors_count);
try eb.addRootErrorMessage(.{
.msg = try eb.addString(self.root_error_message),
.src_loc = src_loc,
.notes_len = notes_len,
});
const notes_start = try eb.reserveNotes(notes_len);
for (self.errors, notes_start..) |item, note_i| {
switch (item) {
.unable_to_create_sym_link => |info| {
eb.extra.items[note_i] = @intFromEnum(try eb.addErrorMessage(.{
.msg = try eb.printString("unable to create symlink from '{s}' to '{s}': {s}", .{
info.file_name, info.link_name, @errorName(info.code),
}),
}));
},
.unable_to_create_file => |info| {
eb.extra.items[note_i] = @intFromEnum(try eb.addErrorMessage(.{
.msg = try eb.printString("unable to create file '{s}': {s}", .{
info.file_name, @errorName(info.code),
}),
}));
},
.unsupported_file_type => |info| {
eb.extra.items[note_i] = @intFromEnum(try eb.addErrorMessage(.{
.msg = try eb.printString("file '{s}' has unsupported type '{c}'", .{
info.file_name, info.file_type,
}),
}));
},
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that all of this would then be combined into one function now, eliminating the need to mutate the errors array in filterErrors. However, I think this is good enough to merge, so I'll leave you with this idea to implement if you like it.

@andrewrk andrewrk merged commit 215de3e into ziglang:master Apr 9, 2024
10 checks passed
ianic added a commit to ianic/zig that referenced this pull request Apr 11, 2024
andrewrk pushed a commit that referenced this pull request Apr 11, 2024
@andrewrk
Copy link
Member

What is the suggested workflow for updating build.zig.zon in these tarballs? Particularly the ones that have duplicate files in them

@ianic
Copy link
Contributor Author

ianic commented Feb 26, 2025

Sorry, but not sure that I understand question.

I tried building zig-fetch-test with zig from newhash branch. Package with same file name with different case (file4, File4):

└── sample-0.0.0-AAAAALAeAAD_jKVNdoGmFmUC-swiWiaXET-TsnDrw29y
    ├── build.zig
    ├── build.zig.zon
    ├── dir1
    │   └── dir2
    │       └── dir3
    │           ├── file1
    │           ├── file2
    │           └── file3
    ├── dir4
    │   └── dir5
    │       └── dir6
    │           ├── file1 -> ../../../dir1/dir2/dir3/file1
    │           └── file2_symlink -> ../../../dir1/dir2/dir3/file2
    ├── dir7
    │   └── dir8
    │       └── dir9
    │           ├── file4
    │           └── File4
    └── src
        ├── main.zig
        └── root.zig

But, what I found that deeper dependencies can't be fetched any more. Root build.zig.zon has "mach" as dependency, and it is fetched fine. But mach has "mach-freetype" package dependency. That can't be fetched:

/home/ianic/Code/zig-fetch-test/zig-global-cache/tmp/5a19e3d55799b13e/mach-freetype-dc4a5d8ce14f8678f35bdaf197303091e22b1f27/build.zig.zon:2:13: error: name must be a valid bare zig identifier (hint: switch from string to enum literal)
    .name = "mach-freetype",

First level dependencies are fetched fine, but new zig requires changes in build.zig.zon for second level dependencies!
Second level dependencies should be allowed to use old style build.zig.zon also.

@andrewrk
Copy link
Member

andrewrk commented Feb 26, 2025

Hmm, I don't know how that can happen, since queueJobsForDeps has:

                .allow_missing_paths_field = true,
                .allow_missing_nonce = true,
                .allow_name_string = true,

Edit: did you perhaps try with yesterday's version rather than today's version, which did not take into account @castholm's feedback yet?

@ianic
Copy link
Contributor Author

ianic commented Feb 26, 2025

I'm using newhas branch, 5150b22 is last commit.
This is build.zig.zon which fails:

.{
    .name = .fetch,
    .version = "0.0.0",
    .nonce = 0x82e96343ea66b4bd,

    .dependencies = .{
        .mach = .{
            .url = "https://github.com/hexops/mach/archive/d4cd79440ea16bf59156cd57707e5833acb8e1b5.tar.gz",
            .hash = "12204662d0dc8a74270b1b219ad1b41e63568f7ae532a243a5fc38b1a95f58f756ae",
        },
    },

    .paths = .{
        "",
    },
}

@castholm
Copy link
Contributor

Parsing/validating the mach-freetype manifest fails because its name contains a hyphen, which is no longer legal in package names and listed as a breaking change in the #22994 description.

@ianic
Copy link
Contributor Author

ianic commented Feb 27, 2025

Mind the difference here. Requiring changes in my build.zig.zon when updating to latest compiler is different then requiring synchronous change in the whole tree of my dependencies.
There is nothing I can do, by editing my code, to make it buildable with new Zig.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants