Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor namespace reconciliation to ensure openshift rbac requisites #2468

Conversation

anithapriyanatarajan
Copy link
Contributor

@anithapriyanatarajan anithapriyanatarajan commented Dec 18, 2024

Changes

During upgrade of pipelines in OpenShift cluster there are additional rbac related reconciliation to be done across all namespaces in the cluster. Right now the namespace reconciliation happens one at a time in a for loop. This results in longer upgrade time. This PR improvises this by

  • Utilising Patch in place of Update
  • Refactor to improve code modularity
  • Bulk update of ClusterInterceptor ClusterRoleBinding(CRB) with namespaces that are processed succesfully instead of updating the CRB everytime for each namespace
    [Note: The original scope included concurrency which was later dropped since there was no significant impact in time taken to process the namespaces]

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

See the contribution guide for more details.

Release Notes

NONE

@tekton-robot tekton-robot added the release-note-none Denotes a PR that doesnt merit a release note. label Dec 18, 2024
@tekton-robot tekton-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 18, 2024
@tekton-robot
Copy link
Contributor

Hi @anithapriyanatarajan. Thanks for your PR.

I'm waiting for a tektoncd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 18, 2024
@pratap0007
Copy link
Contributor

/ok-to-test

@tekton-robot tekton-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 19, 2024
pkg/reconciler/openshift/tektonconfig/rbac.go Outdated Show resolved Hide resolved
pkg/reconciler/openshift/tektonconfig/rbac.go Outdated Show resolved Hide resolved
pkg/reconciler/openshift/tektonconfig/rbac.go Outdated Show resolved Hide resolved
pkg/reconciler/openshift/tektonconfig/rbac.go Outdated Show resolved Hide resolved
pkg/reconciler/openshift/tektonconfig/rbac.go Outdated Show resolved Hide resolved
pkg/reconciler/openshift/tektonconfig/rbac.go Outdated Show resolved Hide resolved
@anithapriyanatarajan anithapriyanatarajan force-pushed the openshift-rbac-concurrent-ns-reconcilation branch from 836f97a to 397b47a Compare December 19, 2024 12:50
Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naive question (on this and in some other places in general), shouldn't we use PATCH more often ? It would reduce the possibility to get "this object ways already update" type of failures.

pkg/reconciler/openshift/tektonconfig/rbac.go Outdated Show resolved Hide resolved
@anithapriyanatarajan anithapriyanatarajan force-pushed the openshift-rbac-concurrent-ns-reconcilation branch from 397b47a to 1007e40 Compare December 20, 2024 15:26
@tekton-robot tekton-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 20, 2024
@tekton-robot
Copy link
Contributor

The following is the coverage report on the affected files.
Say /test pull-tekton-operator-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/common/utils.go 85.7% 83.3% -2.4

pkg/reconciler/openshift/tektonconfig/rbac.go Outdated Show resolved Hide resolved
pkg/reconciler/openshift/tektonconfig/rbac.go Outdated Show resolved Hide resolved
pkg/reconciler/openshift/tektonconfig/rbac.go Outdated Show resolved Hide resolved
pkg/reconciler/openshift/tektonconfig/rbac.go Outdated Show resolved Hide resolved
pkg/reconciler/openshift/tektonconfig/rbac.go Outdated Show resolved Hide resolved
pkg/reconciler/openshift/tektonconfig/rbac.go Outdated Show resolved Hide resolved
pkg/reconciler/openshift/tektonconfig/rbac.go Outdated Show resolved Hide resolved
@anithapriyanatarajan anithapriyanatarajan force-pushed the openshift-rbac-concurrent-ns-reconcilation branch from 1007e40 to e2b0b7e Compare January 6, 2025 11:38
@tekton-robot
Copy link
Contributor

The following is the coverage report on the affected files.
Say /test pull-tekton-operator-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/common/utils.go 85.7% 83.3% -2.4

@jkhelil
Copy link
Member

jkhelil commented Jan 6, 2025

@anithapriyanatarajan have you tested upgrade scenario with this PR ?

  • Clone Operatror from main branch
  • install Operator make apply TARGET=openshift
  • git checkout your current branch
  • change VERSION to someother value in config/openshift/base/operator.yaml file
  • make apply TARGET=openshift again
    you would eventually,
  • create a high number of namespaces (before the first make apply)
  • save and compare clusterinterceptore role binding object before and after the upgrade
  • other checks ...?

@anithapriyanatarajan
Copy link
Contributor Author

@anithapriyanatarajan have you tested upgrade scenario with this PR ?

  • Clone Operatror from main branch
  • install Operator make apply TARGET=openshift
  • git checkout your current branch
  • change VERSION to someother value in config/openshift/base/operator.yaml file
  • make apply TARGET=openshift again
    you would eventually,
  • create a high number of namespaces (before the first make apply)
  • save and compare clusterinterceptore role binding object before and after the upgrade
  • other checks ...?

@jkhelil - Thanks for the detailed comments. The following steps were performed to test the upgrade.

  1. Create new openshift cluster
  2. Create a large number of namespaces.
    For testing, 500 namespaces were created with below command
    $ seq 1 500 | xargs -P 20 -I {} kubectl create namespace namespace-{}
  3. Clone Operator main branch
  4. Install make apply TARGET=openshift
  5. Observe & capture the logs, operator pod CPU usage, tektonconfig, clusterinterceptor rolebinding yaml
  6. Update the Version to random value & make apply again
  7. Observe & capture the logs, operator pod CPU usage, tektonconfig, clusterinterceptor rolebinding yaml
    Key observation:
    config, rolebinding were all updated as expected. Duration for upgrade of ~500 namespaces: 2m54.381192967s
  8. Check out the PR
    $ gh pr checkout 2468
  9. Update the Version to new value & make apply again
  10. Observe & capture the logs, operator pod CPU usage, tektonconfig, clusterinterceptor rolebinding yaml
    Key observation:
    config, rolebinding were all updated as expected. Duration for upgrade: 2m25.410893374s with concurrency of handling 75 namespaces.

Though the refactored code is working fine for upgrade, the improvement in duration is very minimal, hope that is expected. For all the cases observed CPU and memory metrics from Openshift Metrics dashboard

@jkhelil
Copy link
Member

jkhelil commented Jan 9, 2025

@anithapriyanatarajan
Have you done a test with fresh install on openshift ?
I have an issue running the PR on openshift

operator git:(openshift-rbac-concurrent-ns-reconcilation) oc get tektonconfig
NAME     VERSION   READY   REASON
config   devel     False   PreReconciliation failed with message: the body of the request was in an unknown format - accepted media types include: application/json-patch+json, application/merge-patch+json, application/apply-patch+yaml
➜

@jkhelil
Copy link
Member

jkhelil commented Jan 9, 2025

@anithapriyanatarajan

  • first thanks for all the work you have done so far
  • regarding your great conclusion that concurrency doesnt brings any significant enhancement, my opinion is that we should avoid employ it there, it is useless, but we should use some of your work to
  • rewrite the createResource method, as you suggested (process namespace, in each own function, and so one)
  • bulk update the clusterrolebnding
  • use patch instead of update (be careful here, i think the issue iam facing in fresh install is may be related to this)

@anithapriyanatarajan anithapriyanatarajan force-pushed the openshift-rbac-concurrent-ns-reconcilation branch from e2b0b7e to 5f86862 Compare January 10, 2025 08:53
@tekton-robot tekton-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 10, 2025
@tekton-robot
Copy link
Contributor

The following is the coverage report on the affected files.
Say /test pull-tekton-operator-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/common/utils.go 85.7% 83.3% -2.4

@anithapriyanatarajan
Copy link
Contributor Author

/test pull-tekton-operator-build-tests

@anithapriyanatarajan
Copy link
Contributor Author

@anithapriyanatarajan

  • first thanks for all the work you have done so far
  • regarding your great conclusion that concurrency doesnt brings any significant enhancement, my opinion is that we should avoid employ it there, it is useless, but we should use some of your work to
  • rewrite the createResource method, as you suggested (process namespace, in each own function, and so one)
  • bulk update the clusterrolebnding
  • use patch instead of update (be careful here, i think the issue iam facing in fresh install is may be related to this)

@jkhelil

Removed the concurrency and mutex aspects from the PR. The resulting changes are just meant for refactoring for modularity and incorporate enhancements to bulk update clusterrolebinidng and utilising Patch instead of Update. Please review and share your views

@anithapriyanatarajan
Copy link
Contributor Author

@anithapriyanatarajan Have you done a test with fresh install on openshift ? I have an issue running the PR on openshift

operator git:(openshift-rbac-concurrent-ns-reconcilation) oc get tektonconfig
NAME     VERSION   READY   REASON
config   devel     False   PreReconciliation failed with message: the body of the request was in an unknown format - accepted media types include: application/json-patch+json, application/merge-patch+json, application/apply-patch+yaml
➜

Thank you very much for the catch. This issue happens when there is a fresh install of the new version in which case Patch of exising tektonconfig is triggered to update labels. I had used a deprecated type 'StrategicMergePatchType' instead of 'MergePatchTpe' which has created the issue. Fixed this now.

@anithapriyanatarajan
Copy link
Contributor Author

/test pull-tekton-operator-build-tests

pkg/reconciler/common/utils.go Outdated Show resolved Hide resolved
pkg/reconciler/openshift/tektonconfig/extension.go Outdated Show resolved Hide resolved
@anithapriyanatarajan anithapriyanatarajan force-pushed the openshift-rbac-concurrent-ns-reconcilation branch from 5f86862 to 505ae82 Compare January 16, 2025 13:42
@tekton-robot
Copy link
Contributor

The following is the coverage report on the affected files.
Say /test pull-tekton-operator-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/common/utils.go 85.7% 83.3% -2.4

@anithapriyanatarajan anithapriyanatarajan force-pushed the openshift-rbac-concurrent-ns-reconcilation branch from 505ae82 to 8f9a5a9 Compare January 16, 2025 18:18
@anithapriyanatarajan anithapriyanatarajan force-pushed the openshift-rbac-concurrent-ns-reconcilation branch from 8f9a5a9 to 4e825de Compare January 16, 2025 18:24
@tekton-robot
Copy link
Contributor

The following is the coverage report on the affected files.
Say /test pull-tekton-operator-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/common/utils.go 85.7% 83.3% -2.4

@anithapriyanatarajan anithapriyanatarajan changed the title concurrent namespace reconciliation to ensure openshift rbac requisites refactor namespace reconciliation to ensure openshift rbac requisites Jan 17, 2025
@jkhelil
Copy link
Member

jkhelil commented Jan 17, 2025

/approve

@tekton-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jkhelil

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 17, 2025
@jkhelil
Copy link
Member

jkhelil commented Jan 17, 2025

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 17, 2025
@tekton-robot tekton-robot merged commit 9adaac7 into tektoncd:main Jan 17, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesnt merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants