Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make directory cleanup more robust #194

Open
clelange opened this issue Jan 13, 2022 · 2 comments
Open

Make directory cleanup more robust #194

clelange opened this issue Jan 13, 2022 · 2 comments

Comments

@clelange
Copy link
Collaborator

clelange commented Jan 13, 2022

See discussion in #193

I support making this overall more robust. However, I think that tacking on individual checks for specific edge cases might not be the best approach as we might still miss something. How about instead, we generally force the user to use a previously nonexistant directory the first time they execute create_files. That would require us to preserve between runs the knowledge of whether or not a given directory was originally created by hepdata_lib. We could easily accomplish this by depositing an empty signifier file (e.g. $DIRECTORY/.created_by_hepdata_lib) in the desired directory. Each time it is run, create_files would check whether the output directory exists already and whether the signifier file exists. If the directory does not yet exist, we proceed as normal with creating the directory as well as the signifier file. If the directory exists, but the file does not, we exit and give the user a warning telling them that they should use a dedicated empty directory in order to avoid trouble.

Right this moment, though, we have code published on pypi that can accidentally wipe user files with default settings. Therefore, let's please merge this hot fix and mint a new version. That buys us a little time to think through how to really fix this once and for all.

Originally posted by @AndreasAlbert in #193 (comment)

@AndreasAlbert
Copy link
Collaborator

The alternative would be to actively track individual files we have created and only ever delete those. Basically, whenever we create an output file, we would write its name into a persistent storage file in the output directory. Then, when deletion time comes, we only delete whatever was in that file.

@clelange
Copy link
Collaborator Author

The alternative would be to actively track individual files we have created and only ever delete those. Basically, whenever we create an output file, we would write its name into a persistent storage file in the output directory. Then, when deletion time comes, we only delete whatever was in that file.

Tracking all this would be very involved. I see too many failure scenarios, e.g. the user deleting the file that does the tracking, the execution of the script being cancelled while running etc.

I would prefer to proceed as I wrote in #193:

  1. do not allow the output directory to be the same as the directory the python files reside (and the script directory must also not be a subdirectory of the output directory)
  2. in addition, we could also force that a nonexistent directory is used as output the first time the script is run, which could be checked (as suggested above) by creating a file $DIRECTORY/.created_by_hepdata_lib or similar, but I'm not sure how robust this would be.
  3. Add additional confirmation when directory would be deleted, which could be overwritten by a --force-cleanup flag or similar

Maybe the relatively simple directory check would do since by default the cleanup is now disabled?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants