Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenFeature is disabled? #579

Closed
agardnerIT opened this issue Dec 6, 2023 · 16 comments · Fixed by #582
Closed

OpenFeature is disabled? #579

agardnerIT opened this issue Dec 6, 2023 · 16 comments · Fixed by #582
Labels
bug Something isn't working

Comments

@agardnerIT
Copy link
Contributor

In the same investigation as #575 and open-feature/flagd#1063

2023-12-06T23:11:23Z	DEBUG	controller-runtime.webhook.webhooks	received request	{"webhook": "/mutate-v1-pod", "UID": "8f021e26-6d9b-47f2-90f2-588236ec388d", "kind": "/v1, Kind=Pod", "resource": {"group":"","version":"v1","resource":"pods"}}2023-12-06T23:11:23Z	DEBUG	controller-runtime.webhook.webhooks	wrote response	{"webhook": "/mutate-v1-pod", "code": 200, "reason": "OpenFeature is disabled", "UID": "8f021e26-6d9b-47f2-90f2-588236ec388d", "allowed": true}

Also noticed:

2023-12-06T23:10:31Z	ERROR	setup	podMutator backfill permissions error	{"error": "Index with name field:metadata.annotations.openfeature.dev/allowkubernetessync does not exist"}

OFO v0.5.2 deployed w/ Argo (Helm) + flagd proxy.

Application and Flag Config

apiVersion: core.openfeature.dev/v1beta1
kind: FeatureFlag
metadata:
  name: sample-flags
  namespace: openfeatureplayground-team70-preprod-cd
  labels:
    app: open-feature-demo
spec:
  flagSpec:
    flags:
      new-welcome-message:
        state: ENABLED
        variants:
          'on': true
          'off': false
        defaultVariant: 'off'
---
apiVersion: core.openfeature.dev/v1beta1
kind: FeatureFlagSource
metadata:
  name: flag-source
  namespace: openfeatureplayground-team70-preprod-cd
spec:
  sources:
  - source: openfeatureplayground-team70-preprod-cd/sample-flags
    provider: flagd-proxy
---
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: "openfeatureplayground-team70"
  namespace: "openfeatureplayground-team70-preprod-cd"
  labels:
    dt.owner: "team70"
spec:
  replicas: 2
  strategy:
    canary:
      steps:
      - setWeight: 50
      - pause: {duration: 5s}
      - setWeight: 100
  revisionHistoryLimit: 0
  selector:
    matchLabels:
      app.kubernetes.io/name: openfeaturedemo
  template:
    metadata:
      annotations:
        openfeature.dev/enabled: "true"
        openfeature.dev/featureflagsource: "openfeatureplayground-team70-preprod-cd/flag-source"
      labels:
        dt.owner: "team70"
        app.kubernetes.io/name: openfeaturedemo
        app.kubernetes.io/part-of: "openfeatureplayground-team70"
        app.kubernetes.io/version: "0.12.1"
        dynatrace-release-stage: "preprod"
    spec:
      containers:
      - name: openfeature-demo
        image: ghcr.io/open-feature/playground-app:v0.12.1
        args:
          - flagd
        env:
        - name: DT_RELEASE_VERSION
          valueFrom:
            fieldRef:
              fieldPath: metadata.labels['app.kubernetes.io/version']
        - name: DT_RELEASE_PRODUCT
          valueFrom:
            fieldRef:
              fieldPath: metadata.labels['app.kubernetes.io/part-of']
        - name: DT_RELEASE_STAGE
          valueFrom:
            fieldRef:
              fieldPath: metadata.labels['dynatrace-release-stage']
        - name: DT_CUSTOM_PROP
          value: "owner=team70 project=openfeatureplayground stage=preprod"
        - name: DT_TAGS
          value: "dt.owner=team70"
        ports:
        - name: http
          containerPort: 30000
          protocol: TCP
        resources:
          requests:
            memory: 3Mi
            cpu: 5m

kubectl version

$ kubectl version
Client Version: v1.28.1-eks-43840fb
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.3-eks-4f4795d
@toddbaert
Copy link
Member

toddbaert commented Dec 7, 2023

@agardnerIT I'm not too familiar with Argo Rollout, but I'm a but suspicious that the use of the openfeature.dev/enabled isn't working in that context (we'd typically see it in deployments). Specifically I'm wondering if it even makes it onto the pod (where it needs to be) from there.

I see in the Rollout docs that you can reference a workload (deployment) with workloadRef instead of using a template directly. Can you try that with a deployment that has the annotation?

@odubajDT
Copy link
Contributor

odubajDT commented Dec 7, 2023

I guess the difference between the index creation here and the idex used when backfilling the permissions here might be the problem.

The index creation is happening with spec.template.metadata.annotations.openfeature.dev/openfeature.dev/allowkubernetessync but instead the permission backilling function is searching for metadata.annotations.openfeature.dev/openfeature.dev/allowkubernetessync. I would say these need to be aligned. Honestly not sure which one of them is the correct one, will try to search when this was changed. It might also be me, as I made bigger refactoring of the webhooks recently and this probably slipped through.

@odubajDT
Copy link
Contributor

odubajDT commented Dec 7, 2023

Ok, I found to problem and it was me introducing it, before the refactoring, the indexer creation was creating an index to metadata.annotations.openfeature.dev/openfeature.dev/allowkubernetessync, currently it's creating an index for spec.template.metadata.annotations.openfeature.dev/openfeature.dev/allowkubernetessync.

It makes sense as this annotation is injected to the pod and not to the deployment. Therefore the indexed field is the annotation in the pod and not the annotation in the pod template, which is part of the deployment.

I already have a fix and will submit a PR soon

Sorry about the problems

@odubajDT
Copy link
Contributor

odubajDT commented Dec 7, 2023

Fix available here #582

The bug was introduced here d234410#diff-2873f79a86c0d8b3335cd7731b0ecf7dd4301eb19a82ef7a1cba7589b5252261L188

@thisthat thisthat added the bug Something isn't working label Dec 7, 2023
@toddbaert toddbaert reopened this Dec 8, 2023
@toddbaert
Copy link
Member

Reopening this pending resolution of this comment.

@odubajDT
Copy link
Contributor

According to Adam's manifests, I think the enablement of OF should be ok. Also if the OF would not bet enabled, we wouldn't receive the problems with backfilling permissions -> we won't be able to reach this part of code at all.

Maybe Argo was creating a Pod in a completely different namespace at that time (completely unrelated to this Deployment)? Here it would make sense that the podMutator kicks in, checks if this Pod is annotated, finds out it's not, writes out the message and lets the pod to be bind to a node without any mutation? I think here we need to have more info about what was happening in the system generally... Just thinking out loud what the possibilities might be, do not have any proof for this theory.

@agardnerIT can you maybe provide us with some more insight? Thank you!

@edxz7
Copy link

edxz7 commented Dec 24, 2023

Hi I'm also facing the same error

I start playing with the killercoda scenario for the openfeature operator for a POC, but after following the instructions I noticed that the changes made to the CRDs had not effects in the demo app, so I further investigate why and I found that:

  • The flagd sidecard wasn't injected in the pod running the app
  • I checked the logs of the pod running the operator and I found the exact same error reported here:
Screen Shot 2023-12-23 at 18 11 11

Note: I previously did the changes mentioned here to migrate the CRs for versions of the OpenFeature Operator above 0.5.0 because the example files are outdated

I also tried installing everything in my k8s cluster following the instructions provided in your docs

After not seeing the flagd sidecar being injected, I checked the logs of the pod running the operator and again I found the same error

2023-12-23T20:20:26Z	ERROR	setup	podMutator backfill permissions error	{"error": "Index with name field:metadata.annotations.openfeature.dev/allowkubernetessync does not exist"}

and also I have in the logs:

2023-12-23T20:20:39Z	DEBUG	controller-runtime.webhook.webhooks	received request	{"webhook": "/mutate-v1-pod", "UID": "d18b1152-6286-43df-bc5c-e2659363a1c5", "kind": "/v1, Kind=Pod", "resource": {"group":"","version":"v1","resource":"pods"}}
2023-12-23T20:20:39Z	DEBUG	controller-runtime.webhook.webhooks	wrote response	{"webhook": "/mutate-v1-pod", "code": 200, "reason": "OpenFeature is disabled", "UID": "d18b1152-6286-43df-bc5c-e2659363a1c5", "allowed": true}

so the issue is not related to Argo, also is worth to mention that I tried using the versions 0.5.1 and 0.5.0 of the OF operator but all they have the same bug, you can confirm this by installing those versions in the killercoda scenario I mentioned above

@toddbaert
Copy link
Member

@edxz7 @agardnerIT I've released 0.5.3, which should (at the very least partially) resolve this issue. Please confirm when you have time.

@edxz7
Copy link

edxz7 commented Dec 30, 2023

Hello @toddbaert

I did a quick test in the killercoda scenario for the openfeature operator with the version you just released (0.5.3) and the error message: ERROR setup podMutator backfill permissions error has gone but openfeature is still disabled, I got the same logs as before:

2023-12-30T06:32:09Z    DEBUG   controller-runtime.webhook.webhooks     wrote response  {"webhook": "/mutate-v1-pod", "code": 200, "reason": "OpenFeature is disabled", "UID": "fe49eaf0-2906-4f44-aac1-e2bb1203c873", "allowed": true}

and still the OFO doesn't inject the sidecar I don't know if this is the expected behavior (having openfeature disabled by default) and if it's the case, How can I enable it manually?

Btw, I also try to test it in my k8s cluster (which is running in eks) but the installation of the new version ends with an error, this are the logs of the installation:

clusterrolebinding.rbac.authorization.k8s.io/open-feature-operator-flagd-kubernetes-sync configured
clusterrolebinding.rbac.authorization.k8s.io/open-feature-operator-manager-rolebinding unchanged
clusterrolebinding.rbac.authorization.k8s.io/open-feature-operator-proxy-rolebinding unchanged
configmap/open-feature-operator-manager-config unchanged
service/open-feature-operator-controller-manager-metrics-service unchanged
service/open-feature-operator-webhook-service unchanged
deployment.apps/open-feature-operator-controller-manager unchanged
certificate.cert-manager.io/open-feature-operator-serving-cert unchanged
issuer.cert-manager.io/open-feature-operator-selfsigned-issuer unchanged
mutatingwebhookconfiguration.admissionregistration.k8s.io/open-feature-operator-mutating-webhook-configuration configured
error: no matching resources found

I don't know why that error happens, but the pods runs without issues and they throw the same logs as in the killercoda scenario, that is, the error message ERROR setup podMutator backfill permissions error has gone but the message OpenFeature is disabled is also present.

@toddbaert
Copy link
Member

I was just looking at the killercoda demo and there may be other issues, but the main one is that demo hasn't been updated since our change to v1beta1 (migration path outlined here). It's using older CR definitions that are no longer supported in beta. I have created an issue to update the killercoda example, and to lock it to a specific OFO version to prevent this from happening again.

@edxz7 sorry about that!

I believe outside of killercoda, things are be OK. I just ran through https://openfeature.dev/docs/tutorials/ofo/ and it works well. If you can run kind on your machine, I recommend going through that - the content is similar and up-to-date.

@agardnerIT
Copy link
Contributor Author

agardnerIT commented Jan 4, 2024

Hi gents, I think I've now updated the killercoda OpenFeature Operator demo for these fixes.

Please try: https://killercoda.com/agardnerit/scenario/testing-ground and LMK here. If all is well, I'll open a PR.

One thing I noticed is a typo in the docs tutorial. When the the welcome message flag is changed, the second text is: Fib3r: Math at the speed of the internet! not Welcome to Fib3r: Fibonacci as a Service! so the docs need to be updated. (open-feature/openfeature.dev#321 created to track this fix)

Change Summary:

  1. Pin to specific OFO version (see intro_foreground.sh
  2. Updated text to match the docs tutorial (see note about bug above)
  3. Due to the way flagd appends the port number & the way killercoda builds URLS, there's a weird setup and replace in the CRD. First we get the external URL from killercoda, replace their PORT with 30002 and remove the beginning https:// (otherwise we end up with duplicates). Then set FLAGD_HOST_WEB to the correct value + FLAGD_PORT_WEB to 443 (if ommited, flagd will append :8013 which breaks things. Finally FLAGD_TLS: true to re-add https:// otherwise we end up with http://https://KILLERCODA-URL...

@beeme1mr
Copy link
Member

beeme1mr commented Jan 4, 2024

Thanks @agardnerIT, I've updated the tutorial to include the correct banner text.

The tutorial works great but I there were a few minor issues I noticed.

Thanks!

@agardnerIT
Copy link
Contributor Author

Sorry it's taken so long. But I think the above changes have been fixed. You should see DEBUG_VERSION=6 at the top of the terminal output.

If you give me the thumbs up, I'll get the PR done.

@agardnerIT
Copy link
Contributor Author

@beeme1mr just chasing this up. Can we merge the tutorial into the main repo?

@beeme1mr
Copy link
Member

Yes, please open a PR when you have a moment.

@beeme1mr
Copy link
Member

beeme1mr commented May 8, 2024

@agardnerIT please update the Killercoda tutorials when you have a moment. I'm going to close this ticket because the underlying issue has been addressed.

@beeme1mr beeme1mr closed this as completed May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
6 participants