Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script for pruning stale source archives #967

Merged

Conversation

taylormadore
Copy link
Contributor

@taylormadore taylormadore commented Jan 23, 2024

This PR adds a script that can list and (potentially) delete source archives older than a certain threshold.

Maintainers will complete the following section

  • Commit messages are descriptive enough
  • Code coverage from testing does not decrease and new code is covered
  • New code has type annotations
  • [n/a] OpenAPI schema is updated (if applicable)
  • [n/a] DB schema change has corresponding DB migration (if applicable)
  • [n/a] README updated (if worker configuration changed, or if applicable)
  • Draft release notes are updated before merging

@taylormadore taylormadore force-pushed the prune-archives-script branch from 8c3f819 to fa183a4 Compare January 23, 2024 14:20
cachito/workers/prune_archives.py Dismissed Show dismissed Hide dismissed
tests/test_workers/test_prune_archives.py Fixed Show fixed Hide fixed


@app.command()
def list(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're shadowing builtin list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

execute: Annotated[bool, typer.Option(help="Actual deletion will only occur if True.")] = False,
):
"""
List and delete stale source archives.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make more sense for delete() to call list(), rather than duplicating the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked this a bit in the final. Let me know if the changes work for you 👍

@taylormadore taylormadore force-pushed the prune-archives-script branch from fa183a4 to a1a7473 Compare January 25, 2024 00:58
@taylormadore taylormadore marked this pull request as ready for review January 25, 2024 14:23
Comment on lines 414 to 416
"created": None
if self.created is None
else self.created.isoformat(timespec="microseconds"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: if conditional assignments don't fit on a single line we should probably have a local variable for it which we can assign conditionally (if the main cause was indentation) or expand the conditionals to make the code more readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

typer.Option(
callback=_validate_older_than,
formats=["%Y-%m-%d"],
help="Deletes archives that are older than the specified date (UTC). YYYY-MM-DD",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick:

"Deletes archives that are older than the specified date (UTC)

  1. We should never ask end users to enter anything in UTC, because nobody wants to do the calculation in their head, we need to do the conversions automatically
  2. mentioning UTC in context of plain dates might lead to the following question "Uhm, so I want to prune everything older than in my timezone, but is always T00:00 which means that the UTC date would look differently, so which value do I put in there???" I mean this is pure bikeshedding and for the purpose of this script it doesn't matter at all, just something to be aware of.
  3. using replace() on datetime instances doesn't adjust the time values underneath [1], it just forces a new field to the object meaning that you may have filtered out some values out of the evaluation, but here again, not a big deal in this particular case, it just brings questions to however reads the code.

For the sake of argument what I would suggest is to drop the UTC part from the help message, treat all inputs as local timezone to a particular system and then do

datetime.datetime.strptime("<timespec>", <format>).astimezone(datetime.timezone.utc)

in all related places to make sure we represent and base everything correctly.

[1] https://docs.python.org/3/library/datetime.html#datetime.datetime.replace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed 👍

User inputs should now be localtime and are converted to UTC

datetime.isoformat will truncate microseconds if they are zero,
which requires handling with multiple format strings when using
strptime

Signed-off-by: Taylor Madore <[email protected]>
cachito_archives_default_age_days will be the default age in days
after which source archives are purged

cachito_archives_minimum_age_days will be the minimum age in days
before which source archives should not be purged

STONEBLD-1990

Signed-off-by: Taylor Madore <[email protected]>
typer will be used for a script to prune stale cachito
source archives and ratelimit will be used to prevent
that script from potentially overwhelming the API with
requests for what is a maintenance activity

STONEBLD-1990

Signed-off-by: Taylor Madore <[email protected]>
The script will iterate over stored source archives and
attempt to locate the most recent request for them via
the API. Archives determined to be stale will be deleted.

STONEBLD-1990

Signed-off-by: Taylor Madore <[email protected]>
@taylormadore taylormadore force-pushed the prune-archives-script branch from a1a7473 to b683ca2 Compare February 2, 2024 18:50

return _ResolvedArchive(
parsed_archive.path,
datetime.strptime(latest_request["created"], "%Y-%m-%dT%H:%M:%S.%f").replace(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the request timestamps already UTC adjusted or do we need to use the same treatment here as with user CLI input? I'm just curious since I don't see a 'Z' in the unit test timestamps and I'm too lazy to go and check the deployment for real data :).
It's still a nitpick though, so even if it weren't no big deal since I doubt the time shift would make a significant difference in terms of archive cleanup anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the request timestamps returned from the DB all should be UTC already. They're the values formatted by datetime.isoformat() in the first commit here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@taylormadore can you specifically request review from me so that my approval counts?

@taylormadore taylormadore added this pull request to the merge queue Feb 5, 2024
Merged via the queue into containerbuildsystem:master with commit 688837e Feb 5, 2024
14 checks passed
@taylormadore taylormadore deleted the prune-archives-script branch February 5, 2024 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants