Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rm: improve handling of very deep directories with -r #7324

Open
jfinkels opened this issue Feb 18, 2025 · 5 comments
Open

rm: improve handling of very deep directories with -r #7324

jfinkels opened this issue Feb 18, 2025 · 5 comments
Labels

Comments

@jfinkels
Copy link
Collaborator

In pull request #7304, we added a "quick fix" to support removing very deep directories with rm -r. The corresponding test case in GNU coreutils is in tests/rm/deep-2.sh. It would be better to have a more robust solution.

The issue is that calling path.metadata() on a path that is too long returns an InvalidFilename error. Instead what we should do is use a relative path name and keep setting the working directory as we recurse. It must be possible because std::fs::remove_dir_all(path) seems to work on the very long path.

Originally posted by @jfinkels in #7304 (comment)

@jfinkels
Copy link
Collaborator Author

Related to #7217

@dezgeg
Copy link
Contributor

dezgeg commented Feb 19, 2025

Instead of changing current working directory it's usually better to use syscalls like openat() or unlinkat() that can open path relative to a opened directory file descriptor (I expect remove_dir_all is using that under the hood, or it wouldn't be safe to call in a multithreaded program)

@ceteece
Copy link
Contributor

ceteece commented Feb 19, 2025

I'm currently working on #7217 and I agree that using statat and openat is probably the best approach and have looked into this some.

I looked to see if I could find a cross-platform implementation of openat-like functionality, and the closest thing I could find is cap-std / cap-primitives. For du, cap-std is almost perfect except for the fact that it returns an error if you try to follow a symlink that resolves to a path that is not rooted at the directory you are trying to openat/statat relative to. This is a feature that's baked into cap-std, and as far as I can tell there's no way getting around that check using the public APIs of either cap-std or cap-primitives. They do have cross-platform implementations that don't include the checks, but unfortunately those are not public APIs. It might be the case that the symlink check isn't an issue for the purposes of our rm implementation, in which case maybe using cap-std would be fine?

But if it is also an issue for rm, then there are other options:

For Unix-like platforms, rustix provides statat and openat, and is what cap-std uses as the basis for its non-Windows implementations. The only downsides are:

  1. (obviously) It doesn't work with Windows
  2. Calls to statat just give you a Stat object, so you'll have to figure out how to extract what you need from that (which I think varies a little bit across different platforms), as opposed to having a nice platform-agnostic struct like std::fs::Metadata.
  • additionally, I think on Linux you need to use statx instead of statat if you care about getting the file's creation time
  • it seems like the cap-primitives code can be a useful reference for determining how to handle stat-ing on different platforms

I haven't tested this myself yet, but do we also hit the path length errors when calling std::fs::metadata and std::fs::read_dir on Windows? If that's the case I assume maybe we'd want to also have openat-like functionality for Windows? I haven't come across any ready-made implementation for it, so maybe we'd just have to more or less copy the way it's implemented in cap-primitives.

Whether we also need a Windows implementation or not, if it seems like we have at least two utilities that need statat and openat, would it be worth creating our own implementations (that would probably end up being very similar to the private implementations in cap-primitives) that abstract away the differences between platforms? For example, we could have an implementation of statat that returns a nice Metadata-like struct, so we don't have to replicate that code for doing things like choosing to call statx vs. statat and doing whatever platform-specific parsing of the result is necessary.

Sorry for the huge info dump, I'd love to hear other folks' thoughts on this!

@jfinkels
Copy link
Collaborator Author

In rm, there is already a dependency on libc; you could start by developing a solution that works for Unix and then extend from there if you want to work on cross-platform support. https://docs.rs/libc/latest/libc/fn.openat.html

@ChrisDenton
Copy link

I haven't tested this myself yet, but do we also hit the path length errors when calling std::fs::metadata and std::fs::read_dir on Windows? If that's the case I assume maybe we'd want to also have openat-like functionality for Windows? I haven't come across any ready-made implementation for it, so maybe we'd just have to more or less copy the way it's implemented in cap-primitives.

On Windows the rust standard library will to care of opening long paths so there's no error. That said, an openat like functionality may still be an improvement. There is a proposal for the standard library to add a cross-platform interface for this but it's not yet implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants