Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate runs to archive table according to user/group configuration #3619

Open
ekazachkova opened this issue Jul 29, 2024 · 1 comment
Open
Assignees
Labels
kind/enhancement New feature or request state/has-doc Issues that have documentation

Comments

@ekazachkova
Copy link
Contributor

ekazachkova commented Jul 29, 2024

Background
Cloud Pipeline does not allow remove runs and it would be nice to support it because a huge runs count can affect queries performance.

billing service may require runs data for whole period. Therefore, API shall be able to return archived runs with statuses.

Approach

1. API

  • add new metadata field run_archive_days for users or groups. The name of this field shall be managed via system.archive.run.metadata.key preference. This value indicates the number of days to archive runs - all runs that finished more that N days ago shall be archived for user or group entity that provides this metadata value.
  • create a new DB table run_archive with the same structure as pipeline_run
  • create a new DB table archive_run_status_change with the same structure as run_status_change
  • billing service shall be able to load archived data: support archive flag for GET run/activity API method. If enabled runs data from run_archive and archive_run_status_change tables shall be loaded as well for requested period. Default: off.
  • add a new API method to support runs migration POST /runs/archive. Monitoring service shall call this method.

This operation available for admin users only.

Steps:

  • collect metadata with run_archive_days key
  • extract owners and days from collected metadata
  • collect days for owners properly:
    • if user has no archive run metadata and belongs to a multiple groups that have archive runs metadata the minimal value of days count shall be used.
    • If user has an archive run metadata - this value shall be used regardless of whether the value in groups metadata
  • load all master runs in terminal state where owner and end_date columns match the restriction from metadata.
  • load child runs
  • create archive runs
  • remove all dependents (see list below)
  • remove runs by ids

DB table dependents:
run_user
pipeline_run_log
restart_run
pipeline_run_service_url
run_status_change
stop_serverless_run

  • add a new API method POST /runs/archive/owners that migrates runs for specified user or group. Request parameters:
    • ownerSid - user name/id or group name/id
    • principal - true for users
    • days - optional, if not specified the number of days shall be loaded from the metadata. If no corresponding metadata is specified for the requested user or group, an error shall occur. If days parameter specified this value shall be used to process runs regardless of whether the value is present in the metadata.

This operation shall be available for admin users only.

NOTES:

  1. archive runs operations shall be asynchronous. Introduce a new executor BackgroundJobs. The executor's pool size shall be changed via background.api.jobs.pool.size=${CP_API_BACKGROUND_JOBS_POOL_SIZE:10} application property.
  2. API shall be able to perform archive runs operation on large number of owners/runs. To prevent performance or DB query issues let's divide data to chunks by
    2.1. owners - let's load runs by owners partially. Add a new system preference system.archive.run.owners.chunk.size (default: 100) to change the chunk size.
    2.2. runs - let's load master runs with limit and delete dependents for limited data only. Add a new system preference system.archive.run.runs.chunk.size (default: 1000) to change this limit.

2. Monitoring service

Add a new monitoring class that initiates runs data migration. Support preferences:

  • monitoring.archive.runs.delay (default: 1 day in ms) - to manipulate the monitoring frequency
  • monitoring.archive.runs.enable (default: off) - to enable archive runs monitor

3. Billing service

Load GET run/activity with archive flag to support archive runs pricing.

@ekazachkova ekazachkova added the kind/enhancement New feature or request label Jul 29, 2024
ekazachkova added a commit that referenced this issue Aug 1, 2024
ekazachkova added a commit that referenced this issue Aug 6, 2024
ekazachkova added a commit that referenced this issue Aug 6, 2024
ekazachkova added a commit that referenced this issue Aug 6, 2024
ekazachkova added a commit that referenced this issue Aug 6, 2024
@ekazachkova ekazachkova self-assigned this Aug 7, 2024
@ekazachkova ekazachkova changed the title Implement runs removal according to user/group configuration Migrate runs to archive table according to user/group configuration Aug 7, 2024
ekazachkova added a commit that referenced this issue Aug 7, 2024
ekazachkova added a commit that referenced this issue Aug 8, 2024
ekazachkova added a commit that referenced this issue Aug 8, 2024
…nfiguration - async and limit for query support
ekazachkova added a commit that referenced this issue Aug 8, 2024
ekazachkova added a commit that referenced this issue Aug 8, 2024
ekazachkova added a commit that referenced this issue Aug 9, 2024
…nfiguration - fix for async and transactions
ekazachkova added a commit that referenced this issue Aug 12, 2024
ekazachkova added a commit that referenced this issue Aug 12, 2024
…nfiguration - introduce common background jobs executor
ekazachkova added a commit that referenced this issue Aug 12, 2024
ekazachkova added a commit that referenced this issue Aug 13, 2024
…3635)

* Issue #3619: Migrate runs to archive table according to user/group configuration - API for archive run

* Issue #3619: Migrate runs to archive table according to user/group configuration - monitoring part

* Issue #3619: Migrate runs to archive table according to user/group configuration - manual archive runs

* Issue #3619: Migrate runs to archive table according to user/group configuration - billing service

* Issue #3619: Migrate runs to archive table according to user/group configuration - cleanups

* Issue #3619: Migrate runs to archive table according to user/group configuration - test update

* Issue #3619: Migrate runs to archive table according to user/group configuration - async and limit for query support

* Issue #3619: Migrate runs to archive table according to user/group configuration - archive run statuses

* Issue #3619: Migrate runs to archive table according to user/group configuration - archive run statuses

* Issue #3619: Migrate runs to archive table according to user/group configuration - fix for async and transactions

* Issue #3619: Migrate runs to archive table according to user/group configuration - debug logs extended

* Issue #3619: Migrate runs to archive table according to user/group configuration - introduce common background jobs executor

* Issue #3619: Migrate runs to archive table according to user/group configuration - owners batch
NShaforostov added a commit that referenced this issue Sep 2, 2024
* (Issue #3619) 'Runs archiving' doc
* (Issue #3573) 'Container limits' doc
* (Issue #3568) 'Compose a Dockerfile' doc
* (Issue #3576) 'GPU statistics monitor' doc
* (Issue #3602) 'Pod network consumption alert and restriction' doc
@NShaforostov
Copy link
Collaborator

Docs added via #3669 and located here.

@NShaforostov NShaforostov added the state/has-doc Issues that have documentation label Dec 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement New feature or request state/has-doc Issues that have documentation
Projects
None yet
Development

No branches or pull requests

2 participants