Skip to content

Prevent driver from overwhelming during orphan file removal #13084

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

karuppayya
Copy link
Contributor

@karuppayya karuppayya commented May 16, 2025

Currently, orphaned files are collected at the driver before deletion. This can overwhelm the driver when dealing with millions of orphaned files.

This change introduces distributed deletion (with a default parallelism of 10, to avoid throttling), which avoids sending results back to the driver. The operation will return the total count of deleted files.

For backward compatibility, the results will still include the full paths of all orphaned files by default, which can be configured.

TODO

  • Make the distributed delete parallelism configurable

@karuppayya
Copy link
Contributor Author

@karuppayya karuppayya changed the title Prevent driver from overwhelimg during orphan file removal Prevent driver from overwhelming during orphan file removal May 16, 2025
@karuppayya karuppayya closed this May 17, 2025
@karuppayya karuppayya reopened this May 17, 2025
@karuppayya karuppayya force-pushed the remove_tmp branch 2 times, most recently from 0c7e4bc to 2c2f66f Compare May 19, 2025 21:13
@karuppayya karuppayya closed this May 19, 2025
@karuppayya karuppayya reopened this May 19, 2025
Karuppayya Rajendran and others added 2 commits May 20, 2025 10:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant