Make directory cleanup more robust #194

clelange · 2022-01-13T11:13:28Z

See discussion in #193

I support making this overall more robust. However, I think that tacking on individual checks for specific edge cases might not be the best approach as we might still miss something. How about instead, we generally force the user to use a previously nonexistant directory the first time they execute create_files. That would require us to preserve between runs the knowledge of whether or not a given directory was originally created by hepdata_lib. We could easily accomplish this by depositing an empty signifier file (e.g. $DIRECTORY/.created_by_hepdata_lib) in the desired directory. Each time it is run, create_files would check whether the output directory exists already and whether the signifier file exists. If the directory does not yet exist, we proceed as normal with creating the directory as well as the signifier file. If the directory exists, but the file does not, we exit and give the user a warning telling them that they should use a dedicated empty directory in order to avoid trouble.

Right this moment, though, we have code published on pypi that can accidentally wipe user files with default settings. Therefore, let's please merge this hot fix and mint a new version. That buys us a little time to think through how to really fix this once and for all.

Originally posted by @AndreasAlbert in #193 (comment)

The text was updated successfully, but these errors were encountered:

AndreasAlbert · 2022-01-13T11:36:13Z

The alternative would be to actively track individual files we have created and only ever delete those. Basically, whenever we create an output file, we would write its name into a persistent storage file in the output directory. Then, when deletion time comes, we only delete whatever was in that file.

clelange · 2022-01-24T15:18:19Z

The alternative would be to actively track individual files we have created and only ever delete those. Basically, whenever we create an output file, we would write its name into a persistent storage file in the output directory. Then, when deletion time comes, we only delete whatever was in that file.

Tracking all this would be very involved. I see too many failure scenarios, e.g. the user deleting the file that does the tracking, the execution of the script being cancelled while running etc.

I would prefer to proceed as I wrote in #193:

do not allow the output directory to be the same as the directory the python files reside (and the script directory must also not be a subdirectory of the output directory)
in addition, we could also force that a nonexistent directory is used as output the first time the script is run, which could be checked (as suggested above) by creating a file $DIRECTORY/.created_by_hepdata_lib or similar, but I'm not sure how robust this would be.
Add additional confirmation when directory would be deleted, which could be overwritten by a --force-cleanup flag or similar

Maybe the relatively simple directory check would do since by default the cleanup is now disabled?

clelange mentioned this issue Jan 13, 2022

submission.create_files() fails and wipes the directory clean #192

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make directory cleanup more robust #194

Make directory cleanup more robust #194

clelange commented Jan 13, 2022 •

edited

Loading

AndreasAlbert commented Jan 13, 2022

clelange commented Jan 24, 2022

Make directory cleanup more robust #194

Make directory cleanup more robust #194

Comments

clelange commented Jan 13, 2022 • edited Loading

AndreasAlbert commented Jan 13, 2022

clelange commented Jan 24, 2022

clelange commented Jan 13, 2022 •

edited

Loading