[Bug]: The old family Provider and ProviderRevision left when manually installed #1452

pierluigilenoci · 2024-08-08T11:32:46Z

Is there an existing issue for this?

I have searched the existing issues

Affected Resource(s)

No resources affected

Resource MRs required to reproduce the bug

No resources needed

Steps to Reproduce

To reproduce the problem:

Install the providers WITHOUT family provider
Once the automatic family provider appears, try to install the family provider manually
Find a way to make the automatic family Provider and ProviderRevision manifests disappear

What happened?

I expect the automatically installed version will disappear when the family provider is installed manually.

Of course, everything works on a clean cluster.
I am looking for a way to switch from automatic to manual family provider without reinstalling Crossplane in a cluster where there are already MRs used in production.

This is a follow-up bug concerning #1088

More details in the Slack discussion: https://crossplane.slack.com/archives/C05E0UE46S2/p1722852504359609

Relevant Error Output Snippet

kubectl get providers.pkg.crossplane.io
NAME                          INSTALLED   HEALTHY   PACKAGE                                                          AGE
provider-aws-cloudwatchlogs   True        True      xpkg.upbound.io/upbound/provider-aws-cloudwatchlogs:v1.6.0       283d
provider-aws-dynamodb         True        True      xpkg.upbound.io/upbound/provider-aws-dynamodb:v1.6.0             109d
provider-aws-ec2              True        True      xpkg.upbound.io/upbound/provider-aws-ec2:v1.6.0                  389d
provider-aws-elasticache      True        True      xpkg.upbound.io/upbound/provider-aws-elasticache:v1.6.0          389d
provider-aws-iam              True        True      xpkg.upbound.io/upbound/provider-aws-iam:v1.6.0                  389d
provider-aws-mq               True        True      xpkg.upbound.io/upbound/provider-aws-mq:v1.6.0                   389d
provider-aws-rds              True        True      xpkg.upbound.io/upbound/provider-aws-rds:v1.6.0                  389d
provider-aws-s3               True        True      xpkg.upbound.io/upbound/provider-aws-s3:v1.6.0                   389d
provider-family-aws           True        True      xpkg.upbound.io/upbound/provider-family-aws:v1.6.0               5d1h
provider-kubernetes           True        True      xpkg.upbound.io/crossplane-contrib/provider-kubernetes:v0.14.0   301d
provider-sql                  True        True      xpkg.upbound.io/crossplane-contrib/provider-sql:v0.9.0           287d
provider-terraform            True        True      xpkg.upbound.io/upbound/provider-terraform:v0.16.0               389d
upbound-provider-family-aws   True        False     xpkg.upbound.io/upbound/provider-family-aws:v1.10.0              3d20h


### Crossplane Version

v1.16.0-up.1

### Provider Version

1.6.0

### Kubernetes Version

v1.27.15

### Kubernetes Distribution

EKS

### Additional Info

This is a follow-up bug concerning #1088

The text was updated successfully, but these errors were encountered:

haarchri · 2024-08-08T14:06:14Z

did you tried the following ? this is based on https://docs.upbound.io/providers/migration/#migrating-from-monolithic-to-family-official-providers which is the same at the end of the day

please try it in a test cluster first ;)

Set Revision Activation Policy to Manual:

apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
  name: manual-provider-family-aws
spec:
  package: xpkg.upbound.io/upbound/provider-family-aws:v1.10.0
  revisionActivationPolicy: Manual

Verify Provider Installation and Health Status:

Confirm that the "manual family provider" is INSTALLED: False and HEALTHY: True.
Run the following command to check the status:

kubectl get providers
NAME                                INSTALLED   HEALTHY   PACKAGE                                                 AGE
manual-provider-family-aws  False       True      xpkg.upbound.io/upbound/provider-family-aws:v1.10.0

Delete the Automatic Family Provider:
kubectl delete provider.pkg upbound-provider-family-aws

After removing the automatic provider, update the revisionActivationPolicy for the manual-provider-family-aws from Manual to Automatic. This change will allow the provider to automatically manage its resources as needed.

darioef · 2024-08-09T12:29:15Z

Same problem here.

Tried @haarchri suggestion but the manual provider doesn't come to a HEALTY: True state because it says that the automatic one still exists.

status:
  conditions:
    - lastTransitionTime: '2024-08-09T12:18:03Z'
      message: >-
        cannot resolve package dependencies: cannot initialize dependency graph
        from the packages in the lock: node
        xpkg.upbound.io/upbound/provider-family-aws already exists
      reason: UnknownPackageRevisionHealth
      status: Unknown
      type: Healthy

I need to manually install the provider-family-aws because I realized that I'm running it on version v0.38.0, while the other providers (S3, Route53, etc.) are on version v1.0.0. Honestly, I don't know what happened, but it seems that even if I update the version of the Upbound AWS Providers image, provider-family-aws still remains on the old version and continues to install automatically with that version.

haarchri · 2024-08-09T12:42:58Z

Can you remove you Lock Lock resource remove the finalizer - can you send Provider and Providerrevision ?

darioef · 2024-08-09T12:52:53Z

You're the man. It worked!

So, the steps are the same as #1452 (comment) but you need to remove the Lock resource after you install the manual-provider-family-aws.

Thanks for your help.

turkenh · 2024-08-09T14:28:51Z

The complication here originated from deploying the same provider package twice (one already existing as a dependency and another installed manually). In the PR description, both provider-family-aws and upbound-provider-family-aws try to deploy xpkg.upbound.io/upbound/provider-family-aws under the hood and conflict with each other.

Ideally, if you have the family provider already deployed as a dependency and you want to change something, e.g. configure a DeploymentRuntimeConfig, the path to go should be editing/patching the already existing provider instead of deploying a separate Provider object with a different name. In the scenario here, the provider named upbound-provider-family-aws could be modified with spec.runtimeConfigRef in the first place.

darioef · 2024-08-09T17:01:54Z

Thanks for your explanation, in fact, another reason why I want to manually install the provider-family-aws is that when it's installed via a Provider dependency (e.g., the S3 provider), it doesn't respect the configured DeploymentRuntimeConfig and instead applies the default one. I need to use my RuntimeConfig because I've set tolerations and other deployment settings there.

I think this comment talk about the same problem: #1088 (comment)

pierluigilenoci · 2024-08-21T09:14:08Z

@haarchri, thank you a lot.

I managed to clean the clusters but not with little difficulty because the suggestion was not 100% working.

The complete list of operations needed were:

set the revisionActivationPolicy to Manual in both installed Family Providers
remove the finalizer inside the lock file
delete the Lock (if not already gone after removing the finalizer)
delete the old Provider and ProviderRevision

Often, this was enough. Sometimes, I had to do it twice to clean the cluster.

@turkenh, this is not a complication but a plausible scenario.

If anyone initially installs the AWS Family provider but then realizes, for whatever reason, the need to assign a DeploymentRuntimeConfig to the automatically created provider and finish precisely in this situation.

Manually editing the provider is not a plausible solution in a fully GitOps approach, so a more integrated solution makes profound sense.

Furthermore, users should be able to choose their provider's name without impositions or hard-coded names.

This bug is far from being solved because what we have is just a workaround, not a fix for the problem.

pierluigilenoci added bug Something isn't working needs:triage labels Aug 8, 2024

This was referenced Aug 8, 2024

Custom annotations and labels for AWS family provider pod #1088

Closed

Compatibility matrix for provider-family-aws with provider-aws-* #956

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: The old family Provider and ProviderRevision left when manually installed #1452

[Bug]: The old family Provider and ProviderRevision left when manually installed #1452

pierluigilenoci commented Aug 8, 2024 •

edited

Loading

haarchri commented Aug 8, 2024

darioef commented Aug 9, 2024

haarchri commented Aug 9, 2024 •

edited

Loading

darioef commented Aug 9, 2024

turkenh commented Aug 9, 2024

darioef commented Aug 9, 2024

pierluigilenoci commented Aug 21, 2024

[Bug]: The old family Provider and ProviderRevision left when manually installed #1452

[Bug]: The old family Provider and ProviderRevision left when manually installed #1452

Comments

pierluigilenoci commented Aug 8, 2024 • edited Loading

Is there an existing issue for this?

Affected Resource(s)

Resource MRs required to reproduce the bug

Steps to Reproduce

What happened?

Relevant Error Output Snippet

haarchri commented Aug 8, 2024

darioef commented Aug 9, 2024

haarchri commented Aug 9, 2024 • edited Loading

darioef commented Aug 9, 2024

turkenh commented Aug 9, 2024

darioef commented Aug 9, 2024

pierluigilenoci commented Aug 21, 2024

pierluigilenoci commented Aug 8, 2024 •

edited

Loading

haarchri commented Aug 9, 2024 •

edited

Loading