Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Clean old file with Robinhood #32

Open
garadar opened this issue Feb 7, 2025 · 1 comment
Open

[Question] Clean old file with Robinhood #32

garadar opened this issue Feb 7, 2025 · 1 comment

Comments

@garadar
Copy link

garadar commented Feb 7, 2025

Description

I am deploying a Robinhood instance to scan the filesystem and purge unused "old files." We define old files as those that have not been read or written in the last 90 days.

Since BeeGFS is not Lustre (thank you, Captain Obvious 👑), Robinhood needs to scan the entire filesystem to update its information instead of parsing filesystem logs.

Here are my current rules:

rule old_files {
    target_fileclass = user_file;
    condition { last_access > 90d }
    action = common.unlink
}

However, BeeGFS is mounted with the relatime flag, as shown below:

(cluster)-[root@robinhood ~]$ mount | grep /home
beegfs_home on /home type beegfs (rw,relatime,cfgFile=/etc/beegfs/home.d/beegfs-client.conf,_netdev)

Problem

According to the [BeeGFS documentation](https://doc.beegfs.io/latest/advanced_topics/storage_tuning.html#mount-options):

Enabling last file access time is inefficient because the file system needs to update the time stamp by writing data to the disk even in cases when the user only reads file contents or when the file contents have already been cached in memory, and no disk access would have been necessary at all.

Since relatime is enabled, last_access timestamps may not be updated accurately when files are read, leading to potential unintended deletions of recently accessed files.

Question

How can I properly handle this situation to ensure only truly unused files are deleted, without accidentally removing files that have been read but not modified?

@garadar
Copy link
Author

garadar commented Feb 7, 2025

According to Redhat 6 (probably valid for higher versions):

relatime maintains atime data, but not for each time that a file is accessed. With this option enabled, atime data is written to the disk only if the file has been modified since the atime data was last updated (mtime), or if the file was last accessed more than a certain amount of time ago (by default, one day).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant