-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A dependent Statefulset resource is always updated when reconcile is triggered #1989
Comments
Hi @moayad-alyaghshi , as a quick fix you can just use the legacy matching algorithm, see sample: But, to take a look further I need more information regarding this case, could you please turn of trace level logging for the SSABasedGenericKubernetesResourceMatcher and send the logs for this line: I will reproduce and fix the issue based on that. Thank you! |
Hi @csviri, thank you for your quick reply. The legacy matcher is working as expected. Here is the log entry of the line you asked for: 2023-08-01 13:37:09,598 TRACE [io.jav.ope.pro.dep.kub.SSABasedGenericKubernetesResourceMatcher] (pool-73-thread-8) Original actual:
StatefulSet(apiVersion=apps/v1, kind=StatefulSet, metadata=ObjectMeta(annotations={}, creationTimestamp=2023-07-31T14:58:51Z, deletionGracePeriodSeconds=null, deletionTimestamp=null, finalizers=[], generateName=null, generation=1, labels={app.kubernetes.io/instance=tlod, app.kubernetes.io/managed-by=nsql-operator, app.kubernetes.io/name=nsql, app.kubernetes.io/version=10-snapshot, server=primary}, managedFields=[ManagedFieldsEntry(apiVersion=apps/v1, fieldsType=FieldsV1, fieldsV1=FieldsV1(additionalProperties={f:metadata={f:labels={f:app.kubernetes.io/instance={}, f:app.kubernetes.io/managed-by={}, f:app.kubernetes.io/name={}, f:app.kubernetes.io/version={}, f:server={}}, f:ownerReferences={k:{"uid":"a9e34306-30e9-4595-812e-3b52b31dd995"}={}}}, f:spec={f:selector={}, f:serviceName={}, f:template={f:metadata={f:annotations={f:proxy.istio.io/config={}}, f:labels={f:app.kubernetes.io/instance={}, f:app.kubernetes.io/managed-by={}, f:app.kubernetes.io/name={}, f:app.kubernetes.io/version={}, f:server={}}, f:name={}}, f:spec={f:affinity={f:podAntiAffinity={f:preferredDuringSchedulingIgnoredDuringExecution={}}}, f:containers={k:{"name":"nsql"}={.={}, f:env={k:{"name":"NSQL_ACCEPT_EULA"}={.={}, f:name={}, f:value={}}, k:{"name":"NSQL_DB"}={.={}, f:name={}, f:value={}}, k:{"name":"NSQL_DBID_NODE"}={.={}, f:name={}, f:value={}}, k:{"name":"NSQL_DBSA"}={.={}, f:name={}, f:value={}}, k:{"name":"NSQL_DOCKER_LOGGING"}={.={}, f:name={}, f:value={}}, k:{"name":"NSQL_HOST_NAME"}={.={}, f:name={}, f:value={}}, k:{"name":"NSQL_LICENSE_HOME"}={.={}, f:name={}, f:value={}}, k:{"name":"NSQL_POLLING_FROM_DB_HOST_NAME"}={.={}, f:name={}, f:value={}}, k:{"name":"NSQL_POLLING_TO_DB_HOST_NAME"}={.={}, f:name={}, f:value={}}, k:{"name":"NSQL_REPLICA_LIMIT_1"}={.={}, f:name={}, f:value={}}}, f:image={}, f:imagePullPolicy={}, f:livenessProbe={f:exec={f:command={}}, f:periodSeconds={}, f:timeoutSeconds={}}, f:name={}, f:ports={k:{"containerPort":5019,"protocol":"TCP"}={.={}, f:containerPort={}, f:name={}, f:protocol={}}}, f:readinessProbe={f:exec={f:command={}}, f:periodSeconds={}, f:timeoutSeconds={}}, f:resources={f:requests={f:cpu={}, f:memory={}}}, f:startupProbe={f:exec={f:command={}}, f:failureThreshold={}, f:periodSeconds={}}, f:volumeMounts={k:{"mountPath":"/opt/actian/nsql/db"}={.={}, f:mountPath={}, f:name={}}, k:{"mountPath":"/opt/actian/nsql/docker/docker-entrypoint-custom-init-scripts"}={.={}, f:mountPath={}, f:name={}}, k:{"mountPath":"/opt/actian/nsql/docker/docker-entrypoint-custom-shutdown-scripts"}={.={}, f:mountPath={}, f:name={}}, k:{"mountPath":"/opt/actian/nsql/license"}={.={}, f:mountPath={}, f:name={}}}}}, f:imagePullSecrets={k:{"name":"docker-registry"}={}}, f:volumes={k:{"name":"custom-init-scripts"}={.={}, f:configMap={f:defaultMode={}, f:name={}}, f:name={}}, k:{"name":"custom-shutdown-scripts"}={.={}, f:configMap={f:defaultMode={}, f:name={}}, f:name={}}, k:{"name":"nsql-license-volume"}={.={}, f:name={}, f:secret={f:optional={}, f:secretName={}}}}}}, f:volumeClaimTemplates={}}}), manager=nsqlserverreconciler, operation=Apply, subresource=null, time=2023-07-31T14:58:51Z, additionalProperties={}), ManagedFieldsEntry(apiVersion=apps/v1, fieldsType=FieldsV1, fieldsV1=FieldsV1(additionalProperties={f:status={f:availableReplicas={}, f:collisionCount={}, f:currentReplicas={}, f:currentRevision={}, f:observedGeneration={}, f:readyReplicas={}, f:replicas={}, f:updateRevision={}, f:updatedReplicas={}}}), manager=kube-controller-manager, operation=Update, subresource=status, time=2023-08-01T11:13:08Z, additionalProperties={})], name=tlod, namespace=default, ownerReferences=[OwnerReference(apiVersion=nsqloperator.actian.com/v1, blockOwnerDeletion=null, controller=null, kind=NSQLServer, name=tlod, uid=a9e34306-30e9-4595-812e-3b52b31dd995, additionalProperties={})], resourceVersion=8067615, selfLink=null, uid=5ac34d4d-5790-436d-bc62-9a59fe601368, additionalProperties={}), spec=StatefulSetSpec(minReadySeconds=null, ordinals=null, persistentVolumeClaimRetentionPolicy=null, podManagementPolicy=OrderedReady, replicas=1, revisionHistoryLimit=10, selector=LabelSelector(matchExpressions=[], matchLabels={app.kubernetes.io/instance=tlod, app.kubernetes.io/name=nsql, server=primary}, additionalProperties={}), serviceName=tlod-headless, template=PodTemplateSpec(metadata=ObjectMeta(annotations={proxy.istio.io/config={ holdApplicationUntilProxyStarts: true }}, creationTimestamp=null, deletionGracePeriodSeconds=null, deletionTimestamp=null, finalizers=[], generateName=null, generation=null, labels={app.kubernetes.io/instance=tlod, app.kubernetes.io/managed-by=nsql-operator, app.kubernetes.io/name=nsql, app.kubernetes.io/version=10-snapshot, server=primary}, managedFields=[], name=tlod, namespace=null, ownerReferences=[], resourceVersion=null, selfLink=null, uid=null, additionalProperties={}), spec=PodSpec(activeDeadlineSeconds=null, affinity=Affinity(nodeAffinity=null, podAffinity=null, podAntiAffinity=PodAntiAffinity(preferredDuringSchedulingIgnoredDuringExecution=[WeightedPodAffinityTerm(podAffinityTerm=PodAffinityTerm(labelSelector=LabelSelector(matchExpressions=[LabelSelectorRequirement(key=app.kubernetes.io/instance, operator=In, values=[tlod], additionalProperties={}), LabelSelectorRequirement(key=app.kubernetes.io/name, operator=In, values=[nsql], additionalProperties={})], matchLabels={}, additionalProperties={}), namespaceSelector=null, namespaces=[], topologyKey=topology.kubernetes.io/zone, additionalProperties={}), weight=2, additionalProperties={}), WeightedPodAffinityTerm(podAffinityTerm=PodAffinityTerm(labelSelector=LabelSelector(matchExpressions=[LabelSelectorRequirement(key=app.kubernetes.io/instance, operator=In, values=[tlod], additionalProperties={}), LabelSelectorRequirement(key=app.kubernetes.io/name, operator=In, values=[nsql], additionalProperties={})], matchLabels={}, additionalProperties={}), namespaceSelector=null, namespaces=[], topologyKey=kubernetes.io/hostname, additionalProperties={}), weight=1, additionalProperties={})], requiredDuringSchedulingIgnoredDuringExecution=[], additionalProperties={}), additionalProperties={}), automountServiceAccountToken=null, containers=[Container(args=[], command=[], env=[EnvVar(name=NSQL_DBSA, value=nsql, valueFrom=null, additionalProperties={}), EnvVar(name=NSQL_DOCKER_LOGGING, value=true, valueFrom=null, additionalProperties={}), EnvVar(name=NSQL_DB, value=/opt/actian/nsql/db, valueFrom=null, additionalProperties={}), EnvVar(name=NSQL_LICENSE_HOME, value=/opt/actian/nsql/license, valueFrom=null, additionalProperties={}), EnvVar(name=NSQL_ACCEPT_EULA, value=yes, valueFrom=null, additionalProperties={}), EnvVar(name=NSQL_HOST_NAME, value=tlod, valueFrom=null, additionalProperties={}), EnvVar(name=NSQL_DBID_NODE, value=tlod, valueFrom=null, additionalProperties={}), EnvVar(name=NSQL_REPLICA_LIMIT_1, value=on, valueFrom=null, additionalProperties={}), EnvVar(name=NSQL_POLLING_FROM_DB_HOST_NAME, value=tlod-0.tlod-headless.default, valueFrom=null, additionalProperties={}), EnvVar(name=NSQL_POLLING_TO_DB_HOST_NAME, value=tlod-r-0.tlod-headless.default, valueFrom=null, additionalProperties={})], envFrom=[], image=docker.io/actian/nsql-dev:10-snapshot, imagePullPolicy=IfNotPresent, lifecycle=null, livenessProbe=Probe(exec=ExecAction(command=[oscp, -l, @localhost], additionalProperties={}), failureThreshold=3, grpc=null, httpGet=null, initialDelaySeconds=null, periodSeconds=30, successThreshold=1, tcpSocket=null, terminationGracePeriodSeconds=null, timeoutSeconds=2, additionalProperties={}), name=nsql, ports=[ContainerPort(containerPort=5019, hostIP=null, hostPort=null, name=tcp-nsql, protocol=TCP, additionalProperties={})], readinessProbe=Probe(exec=ExecAction(command=[oscp, -l, @localhost], additionalProperties={}), failureThreshold=3, grpc=null, httpGet=null, initialDelaySeconds=null, periodSeconds=30, successThreshold=1, tcpSocket=null, terminationGracePeriodSeconds=null, timeoutSeconds=2, additionalProperties={}), resources=ResourceRequirements(claims=[], limits={}, requests={cpu=500m, memory=2Gi}, additionalProperties={}), securityContext=null, startupProbe=Probe(exec=ExecAction(command=[cat, /opt/actian/nsql/docker/.container_ready], additionalProperties={}), failureThreshold=100, grpc=null, httpGet=null, initialDelaySeconds=null, periodSeconds=3, successThreshold=1, tcpSocket=null, terminationGracePeriodSeconds=null, timeoutSeconds=1, additionalProperties={}), stdin=null, stdinOnce=null, terminationMessagePath=/dev/termination-log, terminationMessagePolicy=File, tty=null, volumeDevices=[], volumeMounts=[VolumeMount(mountPath=/opt/actian/nsql/db, mountPropagation=null, name=data, readOnly=null, subPath=null, subPathExpr=null, additionalProperties={}), VolumeMount(mountPath=/opt/actian/nsql/license, mountPropagation=null, name=nsql-license-volume, readOnly=null, subPath=null, subPathExpr=null, additionalProperties={}), VolumeMount(mountPath=/opt/actian/nsql/docker/docker-entrypoint-custom-init-scripts, mountPropagation=null, name=custom-init-scripts, readOnly=null, subPath=null, subPathExpr=null, additionalProperties={}), VolumeMount(mountPath=/opt/actian/nsql/docker/docker-entrypoint-custom-shutdown-scripts, mountPropagation=null, name=custom-shutdown-scripts, readOnly=null, subPath=null, subPathExpr=null, additionalProperties={})], workingDir=null, additionalProperties={})], dnsConfig=null, dnsPolicy=ClusterFirst, enableServiceLinks=null, ephemeralContainers=[], hostAliases=[], hostIPC=null, hostNetwork=null, hostPID=null, hostUsers=null, hostname=null, imagePullSecrets=[LocalObjectReference(name=docker-registry, additionalProperties={})], initContainers=[], nodeName=null, nodeSelector={}, os=null, overhead={}, preemptionPolicy=null, priority=null, priorityClassName=null, readinessGates=[], resourceClaims=[], restartPolicy=Always, runtimeClassName=null, schedulerName=default-scheduler, schedulingGates=[], securityContext=PodSecurityContext(fsGroup=null, fsGroupChangePolicy=null, runAsGroup=null, runAsNonRoot=null, runAsUser=null, seLinuxOptions=null, seccompProfile=null, supplementalGroups=[], sysctls=[], windowsOptions=null, additionalProperties={}), serviceAccount=null, serviceAccountName=null, setHostnameAsFQDN=null, shareProcessNamespace=null, subdomain=null, terminationGracePeriodSeconds=30, tolerations=[], topologySpreadConstraints=[], volumes=[Volume(awsElasticBlockStore=null, azureDisk=null, azureFile=null, cephfs=null, cinder=null, configMap=null, csi=null, downwardAPI=null, emptyDir=null, ephemeral=null, fc=null, flexVolume=null, flocker=null, gcePersistentDisk=null, gitRepo=null, glusterfs=null, hostPath=null, iscsi=null, name=nsql-license-volume, nfs=null, persistentVolumeClaim=null, photonPersistentDisk=null, portworxVolume=null, projected=null, quobyte=null, rbd=null, scaleIO=null, secret=SecretVolumeSource(defaultMode=420, items=[], optional=true, secretName=nsql-license, additionalProperties={}), storageos=null, vsphereVolume=null, additionalProperties={}), Volume(awsElasticBlockStore=null, azureDisk=null, azureFile=null, cephfs=null, cinder=null, configMap=ConfigMapVolumeSource(defaultMode=493, items=[], name=tlod-custom-init-scripts, optional=null, additionalProperties={}), csi=null, downwardAPI=null, emptyDir=null, ephemeral=null, fc=null, flexVolume=null, flocker=null, gcePersistentDisk=null, gitRepo=null, glusterfs=null, hostPath=null, iscsi=null, name=custom-init-scripts, nfs=null, persistentVolumeClaim=null, photonPersistentDisk=null, portworxVolume=null, projected=null, quobyte=null, rbd=null, scaleIO=null, secret=null, storageos=null, vsphereVolume=null, additionalProperties={}), Volume(awsElasticBlockStore=null, azureDisk=null, azureFile=null, cephfs=null, cinder=null, configMap=ConfigMapVolumeSource(defaultMode=493, items=[], name=tlod-custom-shutdown-scripts, optional=null, additionalProperties={}), csi=null, downwardAPI=null, emptyDir=null, ephemeral=null, fc=null, flexVolume=null, flocker=null, gcePersistentDisk=null, gitRepo=null, glusterfs=null, hostPath=null, iscsi=null, name=custom-shutdown-scripts, nfs=null, persistentVolumeClaim=null, photonPersistentDisk=null, portworxVolume=null, projected=null, quobyte=null, rbd=null, scaleIO=null, secret=null, storageos=null, vsphereVolume=null, additionalProperties={})], additionalProperties={}), additionalProperties={}), updateStrategy=StatefulSetUpdateStrategy(rollingUpdate=RollingUpdateStatefulSetStrategy(maxUnavailable=null, partition=0, additionalProperties={}), type=RollingUpdate, additionalProperties={}), volumeClaimTemplates=[PersistentVolumeClaim(apiVersion=v1, kind=PersistentVolumeClaim, metadata=ObjectMeta(annotations={}, creationTimestamp=null, deletionGracePeriodSeconds=null, deletionTimestamp=null, finalizers=[], generateName=null, generation=null, labels={}, managedFields=[], name=data, namespace=null, ownerReferences=[], resourceVersion=null, selfLink=null, uid=null, additionalProperties={}), spec=PersistentVolumeClaimSpec(accessModes=[ReadWriteOnce], dataSource=null, dataSourceRef=null, resources=ResourceRequirements(claims=[], limits={}, requests={storage=10Gi}, additionalProperties={}), selector=null, storageClassName=null, volumeMode=Filesystem, volumeName=null, additionalProperties={}), status=PersistentVolumeClaimStatus(accessModes=[], allocatedResources={}, capacity={}, conditions=[], phase=Pending, resizeStatus=null, additionalProperties={}), additionalProperties={})], additionalProperties={}), status=StatefulSetStatus(availableReplicas=1, collisionCount=0, conditions=[], currentReplicas=1, currentRevision=tlod-bb5c5cf89, observedGeneration=1, readyReplicas=1, replicas=1, updateRevision=tlod-bb5c5cf89, updatedReplicas=1, additionalProperties={}), additionalProperties={})
original desired:
{apiVersion=apps/v1, kind=StatefulSet, metadata={labels={server=primary, app.kubernetes.io/managed-by=nsql-operator, app.kubernetes.io/name=nsql, app.kubernetes.io/instance=tlod, app.kubernetes.io/version=10-snapshot}, name=tlod, namespace=default, ownerReferences=[{apiVersion=nsqloperator.actian.com/v1, kind=NSQLServer, name=tlod, uid=a9e34306-30e9-4595-812e-3b52b31dd995}]}, spec={selector={matchLabels={app.kubernetes.io/name=nsql, server=primary, app.kubernetes.io/instance=tlod}}, serviceName=tlod-headless, template={metadata={annotations={proxy.istio.io/config={ holdApplicationUntilProxyStarts: true }}, labels={server=primary, app.kubernetes.io/managed-by=nsql-operator, app.kubernetes.io/name=nsql, app.kubernetes.io/instance=tlod, app.kubernetes.io/version=10-snapshot}, name=tlod}, spec={affinity={podAntiAffinity={preferredDuringSchedulingIgnoredDuringExecution=[{podAffinityTerm={labelSelector={matchExpressions=[{key=app.kubernetes.io/instance, operator=In, values=[tlod]}, {key=app.kubernetes.io/name, operator=In, values=[nsql]}]}, topologyKey=topology.kubernetes.io/zone}, weight=2}, {podAffinityTerm={labelSelector={matchExpressions=[{key=app.kubernetes.io/instance, operator=In, values=[tlod]}, {key=app.kubernetes.io/name, operator=In, values=[nsql]}]}, topologyKey=kubernetes.io/hostname}, weight=1}]}}, containers=[{env=[{name=NSQL_DBSA, value=nsql}, {name=NSQL_DOCKER_LOGGING, value=true}, {name=NSQL_DB, value=/opt/actian/nsql/db}, {name=NSQL_LICENSE_HOME, value=/opt/actian/nsql/license}, {name=NSQL_ACCEPT_EULA, value=yes}, {name=NSQL_HOST_NAME, value=tlod}, {name=NSQL_DBID_NODE, value=tlod}, {name=NSQL_REPLICA_LIMIT_1, value=on}, {name=NSQL_POLLING_FROM_DB_HOST_NAME, value=tlod-0.tlod-headless.default}, {name=NSQL_POLLING_TO_DB_HOST_NAME, value=tlod-r-0.tlod-headless.default}], image=docker.io/actian/nsql-dev:10-snapshot, imagePullPolicy=IfNotPresent, livenessProbe={exec={command=[oscp, -l, @localhost]}, periodSeconds=30, timeoutSeconds=2}, name=nsql, ports=[{containerPort=5019, name=tcp-nsql, protocol=TCP}], readinessProbe={exec={command=[oscp, -l, @localhost]}, periodSeconds=30, timeoutSeconds=2}, resources={requests={cpu=500m, memory=2Gi}}, startupProbe={exec={command=[cat, /opt/actian/nsql/docker/.container_ready]}, failureThreshold=100, periodSeconds=3}, volumeMounts=[{mountPath=/opt/actian/nsql/db, name=data}, {mountPath=/opt/actian/nsql/license, name=nsql-license-volume}, {mountPath=/opt/actian/nsql/docker/docker-entrypoint-custom-init-scripts, name=custom-init-scripts}, {mountPath=/opt/actian/nsql/docker/docker-entrypoint-custom-shutdown-scripts, name=custom-shutdown-scripts}]}], imagePullSecrets=[{name=docker-registry}], volumes=[{name=nsql-license-volume, secret={optional=true, secretName=nsql-license}}, {configMap={defaultMode=493, name=tlod-custom-init-scripts}, name=custom-init-scripts}, {configMap={defaultMode=493, name=tlod-custom I can provide a sample project in case that is needed. Thank you |
thx, the second log message seems to be truncated. A simple project to reproduce would be great, thank you! |
You can find the sample project attached. CR:
|
The cause of the problem: SSA marks the whole persistentVolumeClaim as managed by the controller, see:
However, there is a status field added there by the statfulset controller, which is not marked in managed fields, (thus rather there should be an explicit list of fields managed by our controller, and this one excluded), see in resources:
This is an issue with SSA/statefulset controller implementation in K8S. I don't see any elegant solution for SSA matcher, other than maybe provide an ignore list. Maybe we could explicitly handle these known cases. cc @metacosm |
see also: https://kubernetes.slack.com/archives/C0123CNN8F3/p1690977112340099 (will create an issue in Kubernetes if they don't reply here) |
Yes, that was exactly our finding. I will keep an eye on the progress of this issue, and we will use the legacy matcher for time being. Thank you |
hi @csviri, I see that the issue is closed and there is a merged pull request, so when can we expect the fix to be released and in which version? Thank you |
Hi @moayad-alyaghshi yes, these are usually in next minor release, thus will be 4.5, hopefully that will be released this week. |
We have a reconciler which configures a StatefulSet as a dependent resource using CRUDKubernetesDependentResource. Everything was working as expected before, but after updating to the latest version of the sdk (we're using Quarkus, so we updated the quarkus-operator-sdk to version 6.2.1, which corresponds to JOSDK version 4.4 as far as I know), we have a problem that the StatefulSet is always updated even when there's no change to the Spec.
After checking, we found that the matcher SSABasedGenericKubernetesResourceMatcher is always reporting a mismatch between actual and desired, and the exact reason is that the actual StatefulSet Spec has the entry status={phase=Pending} as part of volumeClaimTemplates. Are there a mechanism to avoid this other than overriding the match method?
The text was updated successfully, but these errors were encountered: