Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add Option to Prevent Overlapping Replications of the Same Artifact in Harbor #21347

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

bupd
Copy link
Member

@bupd bupd commented Dec 22, 2024

Summary

This PR addresses a long-standing issue where overlapping replications of the same artifact can occur in Harbor, leading to unnecessary resource consumption and poor performance. By introducing a "Disable parallel replication" checkbox in the replication policy, it ensures that replication tasks for the same policy do not run in parallel, preventing bandwidth overload and queue backups, especially for large artifacts.

Similar Issues

Related Issues

Why do we need this

  1. Users have for long requesting this feature to have single replication execution per policy.
  2. Avoid overlapping replications which is common in using scheduled replications.
  3. Without this replication starts piling up.
  4. And at some point it becomes un-handleable... leading to a snowball effect on bandwidth and the queue.
  5. It makes no sense to run replication in parallel for same replication policy.
  6. It is better to atleast have an option to set replication to single execution at a time per policy.
  7. It is a huge problem when using bigger artifacts that are more than 80gb in size running parallel replication.

Changes Made

  • Added Disable parallel replication Checkbox in Replication UI.
  • Added Execution skipping.
  • Updated Replication Policy.
  • Updated worker.
  • Added skip_if_running column in sql.

Screenshots

add skip if running

image

image

Todo

  • Add Tests

Please indicate you've done the following:

  • Well Written Title and Summary of the PR
  • Label the PR as needed. "release-note/ignore-for-release, release-note/new-feature, release-note/update, release-note/enhancement, release-note/community, release-note/breaking-change, release-note/docs, release-note/infra, release-note/deprecation"
  • Accepted the DCO. Commits without the DCO will delay acceptance.
  • Made sure tests are passing and test coverage is added if needed.
  • Considered the docs impact and opened a new docs issue or PR with docs changes if needed in website repository.

Maksym Trofimenko and others added 4 commits December 22, 2024 21:45
Signed-off-by: bupd <[email protected]>
Co-authored-by: Maksym Trofimenko <[email protected]>
Signed-off-by: bupd <[email protected]>
Co-authored-by: Maksym Trofimenko <[email protected]>
* Adds Skip If Runnning to Replication Policy

Signed-off-by: bupd <[email protected]>
* Adds Checkbox in replication policy UI
* Updates swagger & sql schema

Signed-off-by: bupd <[email protected]>
@bupd bupd added the release-note/new-feature New Harbor Feature label Dec 22, 2024
Copy link

codecov bot commented Dec 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 46.12%. Comparing base (c8c11b4) to head (a997836).
Report is 340 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main   #21347      +/-   ##
==========================================
+ Coverage   45.36%   46.12%   +0.75%     
==========================================
  Files         244      247       +3     
  Lines       13333    13883     +550     
  Branches     2719     2875     +156     
==========================================
+ Hits         6049     6403     +354     
- Misses       6983     7141     +158     
- Partials      301      339      +38     
Flag Coverage Δ
unittests 46.12% <ø> (+0.75%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 491 files with indirect coverage changes

@bupd bupd force-pushed the feat/add-job-skipper branch from 6763a2c to a997836 Compare December 22, 2024 17:43
@bupd bupd marked this pull request as ready for review December 22, 2024 18:23
@bupd bupd requested a review from a team as a code owner December 22, 2024 18:23
@wy65701436
Copy link
Contributor

Thanks, @bupd, for your contribution! I believe this is a valuable new feature for Harbor that should follow the proposal process.

There are several questions we need to briefly discuss:

  • What happens if the execution gets stuck in the 'running' status?
  • How do we determine the relationship between the replication policy and the execution instance — by ID? And what happens if the policy content is updated after the execution begins?
  • During the replication job (at least in Harbor-to-Harbor replication), Harbor skips artifacts that have already been successfully replicated. So, we should consider how much additional benefit this feature will bring.

@@ -7488,6 +7488,10 @@ definitions:
type: boolean
description: Whether to enable copy by chunk.
x-isnullable: true
skip_if_running:
type: boolean
description: Whether to enable skip, if replication already running.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This text is misleading, clearly explain what this option does, providing enough context, background, and result of enabling this future: e.g "Don't start a new replication until the previous one is terminated"

@@ -76,6 +76,7 @@
"PULL_BASED": "Pull the resources from the remote registry to the local Harbor.",
"DESTINATION_NAMESPACE": "Specify the destination namespace. If empty, the resources will be put under the same namespace as the source.",
"OVERRIDE": "Specify whether to override the resources at the destination if a resource with the same name exists.",
"SKIP_IF_RUNNING": "Specify whether to skip the execution when replication already running.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad wording, intention and outcome not clear

@@ -560,6 +561,7 @@
"ALLOWED_CHARACTERS": "Allowed special characters",
"TOTAL": "Total",
"OVERRIDE": "Override",
"SKIP_IF_RUNNING": "Skip if running",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad wording

@Vad1mo Vad1mo enabled auto-merge (squash) December 23, 2024 16:50
@bupd bupd changed the title Add Option to Skip Replication Policy Execution If an Execution is already running feat: Add Option to Prevent Overlapping Replications of the Same Artifact in Harbor Dec 23, 2024
@tpoxa
Copy link
Contributor

tpoxa commented Dec 28, 2024

Thanks, @bupd, for your contribution! I believe this is a valuable new feature for Harbor that should follow the proposal process.

There are several questions we need to briefly discuss:

  • What happens if the execution gets stuck in the 'running' status?

We were planning to use JobServiceMonitorClient to get the current state of job observations. As long observation says it's running - a new execution will be skipped. I believe we should fix the problem with stuck separately. What are your thoughts about that?

  • How do we determine the relationship between the replication policy and the execution instance — by ID? And what happens if the policy content is updated after the execution begins?

Indeed PolicyID was never needed before. We can add it anyway to the job arguments so we can scan observations, parse job arguments and match policyID with the current.

  • During the replication job (at least in Harbor-to-Harbor replication), Harbor skips artifacts that have already been successfully replicated. So, we should consider how much additional benefit this feature will bring.

I believe even checking an artefact for existence produces an additonal HTTP request. So, we still adding some extra efficiency.

@Vad1mo
Copy link
Member

Vad1mo commented Dec 30, 2024

  • During the replication job (at least in Harbor-to-Harbor replication), Harbor skips artifacts that have already been successfully replicated. So, we should consider how much additional benefit this feature will bring.

Regarding that statement. there is a fundamental deficit with harbor replications, when the layer is in the copy process, new replication will try to copy the same blob again. If you have a slow connection like DSL the same blob will be copied over and over again. This results in the domino effect. Each new replication halves the throughput, resulting in slower transfers, and because all replications run in parallel, the same effect happens with the next blob/layer.

We have made an experiment with:

  • Copying 1 GiB image
  • Limit transfer speed 1 Mbit, per replication
    • Please note that we set the 1Mib per replication. On a real DSL, 3G connection, the total bandwidth (1 Mbit) would not be shared across all replications, so each replication would get 1/n of the bandwidth.
    • So the observed time needs to be multiplied by n.
  • Replication every minute
  • Replication time with "Prevent Overlapping Replications" Enabled ->: xxx min
  • Replications time with "Prevent Overlapping Replications" Disabled -> : xxx min

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants