-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add script for pruning stale source archives #967
Add script for pruning stale source archives #967
Conversation
8c3f819
to
fa183a4
Compare
cachito/workers/prune_archives.py
Outdated
|
||
|
||
@app.command() | ||
def list( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're shadowing builtin list
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
execute: Annotated[bool, typer.Option(help="Actual deletion will only occur if True.")] = False, | ||
): | ||
""" | ||
List and delete stale source archives. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make more sense for delete()
to call list()
, rather than duplicating the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworked this a bit in the final. Let me know if the changes work for you 👍
fa183a4
to
a1a7473
Compare
cachito/web/models.py
Outdated
"created": None | ||
if self.created is None | ||
else self.created.isoformat(timespec="microseconds"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: if conditional assignments don't fit on a single line we should probably have a local variable for it which we can assign conditionally (if the main cause was indentation) or expand the conditionals to make the code more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
cachito/workers/prune_archives.py
Outdated
typer.Option( | ||
callback=_validate_older_than, | ||
formats=["%Y-%m-%d"], | ||
help="Deletes archives that are older than the specified date (UTC). YYYY-MM-DD", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick:
"Deletes archives that are older than the specified date (UTC)
- We should never ask end users to enter anything in UTC, because nobody wants to do the calculation in their head, we need to do the conversions automatically
- mentioning UTC in context of plain dates might lead to the following question "Uhm, so I want to prune everything older than in my timezone, but is always
T00:00
which means that the UTC date would look differently, so which value do I put in there???" I mean this is pure bikeshedding and for the purpose of this script it doesn't matter at all, just something to be aware of. - using
replace()
ondatetime
instances doesn't adjust the time values underneath [1], it just forces a new field to the object meaning that you may have filtered out some values out of the evaluation, but here again, not a big deal in this particular case, it just brings questions to however reads the code.
For the sake of argument what I would suggest is to drop the UTC part from the help message, treat all inputs as local timezone to a particular system and then do
datetime.datetime.strptime("<timespec>", <format>).astimezone(datetime.timezone.utc)
in all related places to make sure we represent and base everything correctly.
[1] https://docs.python.org/3/library/datetime.html#datetime.datetime.replace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed 👍
User inputs should now be localtime and are converted to UTC
datetime.isoformat will truncate microseconds if they are zero, which requires handling with multiple format strings when using strptime Signed-off-by: Taylor Madore <[email protected]>
cachito_archives_default_age_days will be the default age in days after which source archives are purged cachito_archives_minimum_age_days will be the minimum age in days before which source archives should not be purged STONEBLD-1990 Signed-off-by: Taylor Madore <[email protected]>
typer will be used for a script to prune stale cachito source archives and ratelimit will be used to prevent that script from potentially overwhelming the API with requests for what is a maintenance activity STONEBLD-1990 Signed-off-by: Taylor Madore <[email protected]>
The script will iterate over stored source archives and attempt to locate the most recent request for them via the API. Archives determined to be stale will be deleted. STONEBLD-1990 Signed-off-by: Taylor Madore <[email protected]>
a1a7473
to
b683ca2
Compare
|
||
return _ResolvedArchive( | ||
parsed_archive.path, | ||
datetime.strptime(latest_request["created"], "%Y-%m-%dT%H:%M:%S.%f").replace( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the request timestamps already UTC adjusted or do we need to use the same treatment here as with user CLI input? I'm just curious since I don't see a 'Z' in the unit test timestamps and I'm too lazy to go and check the deployment for real data :).
It's still a nitpick though, so even if it weren't no big deal since I doubt the time shift would make a significant difference in terms of archive cleanup anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the request timestamps returned from the DB all should be UTC already. They're the values formatted by datetime.isoformat() in the first commit here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@taylormadore can you specifically request review from me so that my approval counts?
688837e
This PR adds a script that can list and (potentially) delete source archives older than a certain threshold.
Maintainers will complete the following section