Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: The old family Provider and ProviderRevision left when manually installed #1452

Open
1 task done
pierluigilenoci opened this issue Aug 8, 2024 · 7 comments
Open
1 task done
Labels
bug Something isn't working needs:triage

Comments

@pierluigilenoci
Copy link

pierluigilenoci commented Aug 8, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Affected Resource(s)

No resources affected

Resource MRs required to reproduce the bug

No resources needed

Steps to Reproduce

To reproduce the problem:

  1. Install the providers WITHOUT family provider
  2. Once the automatic family provider appears, try to install the family provider manually
  3. Find a way to make the automatic family Provider and ProviderRevision manifests disappear

What happened?

I expect the automatically installed version will disappear when the family provider is installed manually.

Of course, everything works on a clean cluster.
I am looking for a way to switch from automatic to manual family provider without reinstalling Crossplane in a cluster where there are already MRs used in production.

This is a follow-up bug concerning #1088

More details in the Slack discussion: https://crossplane.slack.com/archives/C05E0UE46S2/p1722852504359609

Relevant Error Output Snippet

kubectl get providers.pkg.crossplane.io
NAME                          INSTALLED   HEALTHY   PACKAGE                                                          AGE
provider-aws-cloudwatchlogs   True        True      xpkg.upbound.io/upbound/provider-aws-cloudwatchlogs:v1.6.0       283d
provider-aws-dynamodb         True        True      xpkg.upbound.io/upbound/provider-aws-dynamodb:v1.6.0             109d
provider-aws-ec2              True        True      xpkg.upbound.io/upbound/provider-aws-ec2:v1.6.0                  389d
provider-aws-elasticache      True        True      xpkg.upbound.io/upbound/provider-aws-elasticache:v1.6.0          389d
provider-aws-iam              True        True      xpkg.upbound.io/upbound/provider-aws-iam:v1.6.0                  389d
provider-aws-mq               True        True      xpkg.upbound.io/upbound/provider-aws-mq:v1.6.0                   389d
provider-aws-rds              True        True      xpkg.upbound.io/upbound/provider-aws-rds:v1.6.0                  389d
provider-aws-s3               True        True      xpkg.upbound.io/upbound/provider-aws-s3:v1.6.0                   389d
provider-family-aws           True        True      xpkg.upbound.io/upbound/provider-family-aws:v1.6.0               5d1h
provider-kubernetes           True        True      xpkg.upbound.io/crossplane-contrib/provider-kubernetes:v0.14.0   301d
provider-sql                  True        True      xpkg.upbound.io/crossplane-contrib/provider-sql:v0.9.0           287d
provider-terraform            True        True      xpkg.upbound.io/upbound/provider-terraform:v0.16.0               389d
upbound-provider-family-aws   True        False     xpkg.upbound.io/upbound/provider-family-aws:v1.10.0              3d20h

### Crossplane Version

v1.16.0-up.1

### Provider Version

1.6.0

### Kubernetes Version

v1.27.15

### Kubernetes Distribution

EKS

### Additional Info

This is a follow-up bug concerning #1088
@haarchri
Copy link
Member

haarchri commented Aug 8, 2024

did you tried the following ? this is based on https://docs.upbound.io/providers/migration/#migrating-from-monolithic-to-family-official-providers which is the same at the end of the day

please try it in a test cluster first ;)

Set Revision Activation Policy to Manual:

apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
  name: manual-provider-family-aws
spec:
  package: xpkg.upbound.io/upbound/provider-family-aws:v1.10.0
  revisionActivationPolicy: Manual

Verify Provider Installation and Health Status:

Confirm that the "manual family provider" is INSTALLED: False and HEALTHY: True.
Run the following command to check the status:

kubectl get providers
NAME                                INSTALLED   HEALTHY   PACKAGE                                                 AGE
manual-provider-family-aws  False       True      xpkg.upbound.io/upbound/provider-family-aws:v1.10.0

Delete the Automatic Family Provider:
kubectl delete provider.pkg upbound-provider-family-aws

After removing the automatic provider, update the revisionActivationPolicy for the manual-provider-family-aws from Manual to Automatic. This change will allow the provider to automatically manage its resources as needed.

@darioef
Copy link

darioef commented Aug 9, 2024

Same problem here.

Tried @haarchri suggestion but the manual provider doesn't come to a HEALTY: True state because it says that the automatic one still exists.

status:
  conditions:
    - lastTransitionTime: '2024-08-09T12:18:03Z'
      message: >-
        cannot resolve package dependencies: cannot initialize dependency graph
        from the packages in the lock: node
        xpkg.upbound.io/upbound/provider-family-aws already exists
      reason: UnknownPackageRevisionHealth
      status: Unknown
      type: Healthy

I need to manually install the provider-family-aws because I realized that I'm running it on version v0.38.0, while the other providers (S3, Route53, etc.) are on version v1.0.0. Honestly, I don't know what happened, but it seems that even if I update the version of the Upbound AWS Providers image, provider-family-aws still remains on the old version and continues to install automatically with that version.

@haarchri
Copy link
Member

haarchri commented Aug 9, 2024

Can you remove you Lock Lock resource remove the finalizer - can you send Provider and Providerrevision ?

@darioef
Copy link

darioef commented Aug 9, 2024

You're the man. It worked!

So, the steps are the same as #1452 (comment) but you need to remove the Lock resource after you install the manual-provider-family-aws.

Thanks for your help.

@turkenh
Copy link
Contributor

turkenh commented Aug 9, 2024

The complication here originated from deploying the same provider package twice (one already existing as a dependency and another installed manually). In the PR description, both provider-family-aws and upbound-provider-family-aws try to deploy xpkg.upbound.io/upbound/provider-family-aws under the hood and conflict with each other.

Ideally, if you have the family provider already deployed as a dependency and you want to change something, e.g. configure a DeploymentRuntimeConfig, the path to go should be editing/patching the already existing provider instead of deploying a separate Provider object with a different name. In the scenario here, the provider named upbound-provider-family-aws could be modified with spec.runtimeConfigRef in the first place.

@darioef
Copy link

darioef commented Aug 9, 2024

Thanks for your explanation, in fact, another reason why I want to manually install the provider-family-aws is that when it's installed via a Provider dependency (e.g., the S3 provider), it doesn't respect the configured DeploymentRuntimeConfig and instead applies the default one. I need to use my RuntimeConfig because I've set tolerations and other deployment settings there.

I think this comment talk about the same problem: #1088 (comment)

@pierluigilenoci
Copy link
Author

@haarchri, thank you a lot.

I managed to clean the clusters but not with little difficulty because the suggestion was not 100% working.

The complete list of operations needed were:

  • set the revisionActivationPolicy to Manual in both installed Family Providers
  • remove the finalizer inside the lock file
  • delete the Lock (if not already gone after removing the finalizer)
  • delete the old Provider and ProviderRevision

Often, this was enough. Sometimes, I had to do it twice to clean the cluster.

@turkenh, this is not a complication but a plausible scenario.

If anyone initially installs the AWS Family provider but then realizes, for whatever reason, the need to assign a DeploymentRuntimeConfig to the automatically created provider and finish precisely in this situation.

Manually editing the provider is not a plausible solution in a fully GitOps approach, so a more integrated solution makes profound sense.

Furthermore, users should be able to choose their provider's name without impositions or hard-coded names.

This bug is far from being solved because what we have is just a workaround, not a fix for the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs:triage
Projects
None yet
Development

No branches or pull requests

4 participants