Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Readiness Probes Failed. #285

Open
manojguptavonage opened this issue Jan 12, 2023 · 17 comments
Open

Readiness Probes Failed. #285

manojguptavonage opened this issue Jan 12, 2023 · 17 comments

Comments

@manojguptavonage
Copy link

API and UAT pods are having issues because readiness probes are failing. I am following all the steps as mentioned in the ReadMe but I am facing the same issue always.

I am running it in minikube with docker.

image

Details:

API Pod:

Name:             report-reportportal-api-67fdb8577f-g5m2w
Namespace:        default
Priority:         0
Service Account:  default
Node:             minikube/192.168.49.2
Start Time:       Thu, 12 Jan 2023 22:58:38 +0000
Labels:           component=report-reportportal-api
                  pod-template-hash=67fdb8577f
Annotations:      <none>
Status:           Running
IP:               172.17.0.7
IPs:
  IP:           172.17.0.7
Controlled By:  ReplicaSet/report-reportportal-api-67fdb8577f
Containers:
  api:
    Container ID:   docker://30e62fd182dbdc70f87af3003ec870e51a18cc09f5541c65323d8c8594ffeb7f
    Image:          reportportal/service-api:5.7.2
    Image ID:       docker-pullable://reportportal/service-api@sha256:9df41f8fb320092adba85922221b18280247797ab1aad4de4ecdb22fa35833a9
    Port:           8585/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Thu, 12 Jan 2023 22:58:39 +0000
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  2Gi
    Requests:
      cpu:      500m
      memory:   1Gi
    Readiness:  http-get http://:8585/health delay=90s timeout=90s period=90s #success=1 #failure=90
    Environment:
      LOGGING_LEVEL_ORG_HIBERNATE_SQL:          info
      RP_REQUESTLOGGING:                        false
      RP_AMQP_QUEUES:                           10
      RP_AMQP_QUEUESPERPOD:                     10
      JAVA_OPTS:                                -Djava.security.egd=file:/dev/./urandom -XX:+UseG1GC -XX:MinRAMPercentage=60.0 -XX:InitiatingHeapOccupancyPercent=70 -XX:MaxRAMPercentage=90.0 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp
      RP_AMQP_HOST:                             rabbitmq-local.default.svc.cluster.local
      RP_AMQP_PORT:                             5672
      RP_AMQP_ANALYZER-VHOST:                   analyzer
      RP_AMQP_USER:                             rabbitmq
      RP_AMQP_PASS:                             <set to the key 'rabbitmq-password' in secret 'rabbitmq-local'>  Optional: false
      RP_AMQP_APIPORT:                          15672
      RP_AMQP_APIUSER:                          rabbitmq
      RP_AMQP_APIPASS:                          <set to the key 'rabbitmq-password' in secret 'rabbitmq-local'>  Optional: false
      RP_DB_HOST:                               postgres-postgresql.default.svc.cluster.local
      RP_DB_PORT:                               5432
      RP_DB_NAME:                               reportportal
      RP_DB_USER:                               rpuser
      RP_DB_PASS:                               <set to the key 'postgresql-password' in secret 'postgres-postgresql'>  Optional: false
      RP_BINARYSTORE_TYPE:                      minio
      RP_BINARYSTORE_MINIO_ENDPOINT:            http://minio-local.default.svc.cluster.local:9000
      RP_BINARYSTORE_MINIO_ACCESSKEY:           <set to the key 'access-key' in secret 'minio-local'>  Optional: false
      RP_BINARYSTORE_MINIO_SECRETKEY:           <set to the key 'secret-key' in secret 'minio-local'>  Optional: false
      MANAGEMENT_HEALTH_ELASTICSEARCH_ENABLED:  false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8zwp4 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-8zwp4:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  11m                  default-scheduler  Successfully assigned default/report-reportportal-api-67fdb8577f-g5m2w to minikube
  Normal   Pulled     11m                  kubelet            Container image "reportportal/service-api:5.7.2" already present on machine
  Normal   Created    11m                  kubelet            Created container api
  Normal   Started    11m                  kubelet            Started container api
  Warning  Unhealthy  88s (x6 over 8m58s)  kubelet            Readiness probe failed: Get "http://172.17.0.7:8585/health": dial tcp 172.17.0.7:8585: connect: connection refused```


UAT Pod:


```kubectl describe pod report-reportportal-uat-597c98f8d8-xh7p8
Name:             report-reportportal-uat-597c98f8d8-xh7p8
Namespace:        default
Priority:         0
Service Account:  default
Node:             minikube/192.168.49.2
Start Time:       Thu, 12 Jan 2023 22:58:38 +0000
Labels:           component=report-reportportal-uat
                  pod-template-hash=597c98f8d8
Annotations:      <none>
Status:           Running
IP:               172.17.0.16
IPs:
  IP:           172.17.0.16
Controlled By:  ReplicaSet/report-reportportal-uat-597c98f8d8
Containers:
  uat:
    Container ID:   docker://d98f6b6c86e3787d3d64e04320b2ee33f2a4b9fe204d2d4e1a1ecd82d729668a
    Image:          reportportal/service-authorization:5.7.0
    Image ID:       docker-pullable://reportportal/service-authorization@sha256:9e73114dbd151466624fe5d023f926fca5fbcad4a893119ce7b47d86d258c295
    Port:           9999/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Thu, 12 Jan 2023 23:10:05 +0000
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Thu, 12 Jan 2023 23:03:42 +0000
      Finished:     Thu, 12 Jan 2023 23:09:54 +0000
    Ready:          False
    Restart Count:  2
    Limits:
      cpu:     500m
      memory:  2Gi
    Requests:
      cpu:      100m
      memory:   512Mi
    Readiness:  http-get http://:9999/health delay=100s timeout=3s period=10s #success=1 #failure=5
    Environment:
      JAVA_OPTS:                       -Djava.security.egd=file:/dev/./urandom -XX:MinRAMPercentage=60.0 -XX:MaxRAMPercentage=90.0
      RP_SESSION_LIVE:                 86400
      RP_DB_HOST:                      postgres-postgresql.default.svc.cluster.local
      RP_DB_PORT:                      5432
      RP_DB_NAME:                      reportportal
      RP_DB_USER:                      rpuser
      RP_DB_PASS:                      <set to the key 'postgresql-password' in secret 'postgres-postgresql'>  Optional: false
      RP_BINARYSTORE_TYPE:             minio
      RP_BINARYSTORE_MINIO_ENDPOINT:   http://minio-local.default.svc.cluster.local:9000
      RP_BINARYSTORE_MINIO_ACCESSKEY:  <set to the key 'access-key' in secret 'minio-local'>  Optional: false
      RP_BINARYSTORE_MINIO_SECRETKEY:  <set to the key 'secret-key' in secret 'minio-local'>  Optional: false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2wj6s (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-2wj6s:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  12m                   default-scheduler  Successfully assigned default/report-reportportal-uat-597c98f8d8-xh7p8 to minikube
  Normal   Pulled     7m43s (x2 over 12m)   kubelet            Container image "reportportal/service-authorization:5.7.0" already present on machine
  Normal   Created    7m43s (x2 over 12m)   kubelet            Created container uat
  Normal   Started    7m42s (x2 over 12m)   kubelet            Started container uat
  Warning  Unhealthy  2m36s (x41 over 10m)  kubelet            Readiness probe failed: Get "http://172.17.0.16:9999/health": dial tcp 172.17.0.16:9999: connect: connection refused```


Can someone please share a reference to working helm charts (YAML configs) and deployment steps? I have spent multiple hours on this issue but it is not yet resolved.

Thanks
@manojguptavonage
Copy link
Author

I am following below steps:

1/ minikube start --kubernetes-version=v1.21.0
A/helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx && helm repo update
B/ helm install nginx-ingress ingress-nginx/ingress-nginx --version 3.36.0

2/ minikube addons enable ingress

3/ helm install elasticsearch ./reportportal/charts/elasticsearch-7.10.2.tgz -f ./reportportal/elasticsearch/single-node-values.yaml

4/ helm install rabbitmq-local --set auth.username=rabbitmq,auth.password=rabbitmq,replicaCount=1 ./reportportal/charts/rabbitmq-7.5.6.tgz
A/ kubectl exec -it rabbitmq-local-0 -- rabbitmqctl set_vm_memory_high_watermark 0.8

5/ helm install postgres --set postgresqlUsername=rpuser,postgresqlPassword=rppass,postgresqlDatabase=reportportal,postgresqlPostgresPassword=password -f ./reportportal/postgresql/values.yaml ./reportportal/charts/postgresql-10.9.4.tgz

6/ helm install minio-local --set accessKey.password=minio,secretKey.password=minio123,persistence.size=40Gi ./reportportal/charts/minio-7.1.9.tgz

7/ Changes to values.yaml (check details below)

8/ helm package .//

9/ helm install report --set postgresql.SecretName=postgres-postgresql,rabbitmq.SecretName=rabbitmq-local,minio.secretName=minio-local ./reportportal-5.7.2.tgz

Values.YAML (only changed file in the folder):

## String to partially override reportportal.fullname template (will maintain the release name)
##
# nameOverride:

## String to fully override reportportal.fullname template
##
# fullnameOverride:

serviceindex:
  name: index
  repository: reportportal/service-index
  tag: 5.0.11
  pullPolicy: Always
  resources:
    requests:
      cpu: 150m
      memory: 128Mi
    limits:
      cpu: 200m
      memory: 256Mi
  podAnnotations: {}
  securityContext: {}
  ## Define which Nodes the Pods are scheduled on.
  ## ref: https://kubernetes.io/docs/user-guide/node-selection/
  ##
  nodeSelector: {}
  #  disktype: ssd
  service:
    portName: ""

uat:
  repository: reportportal/service-authorization
  name: uat
  tag: 5.7.0
  pullPolicy: Always
  resources:
    requests:
      cpu: 100m
      memory: 512Mi
    limits:
      cpu: 500m
      memory: 2048Mi
  sessionLiveTime: 86400
  podAnnotations: {}
  ## jvmArgs
  ## If you need to use a custom java keystore you can use it through jvmArgs
  ## eg. : -Djavax.net.ssl.trustStore=/etc/secret-volume/custom-pki.jks
  jvmArgs: "-Djava.security.egd=file:/dev/./urandom -XX:MinRAMPercentage=60.0 -XX:MaxRAMPercentage=90.0"
  ## External environment variables
  ##
  extraEnvs: []
    # - name: EXTRA_ENV
    #   value: "TRUE"
    # - name: EXTRA_ENV_SECRET
    #   valueFrom:
    #     secretKeyRef:
    #       name: "additional-credentials"
    #       key: username
  securityContext: {}
  ## Define which Nodes the Pods are scheduled on.
  ## ref: https://kubernetes.io/docs/user-guide/node-selection/
  ##
  nodeSelector: {}
  #  disktype: ssd
  serviceAccountName: ""
  ## Provide a secret containing sensitives data
  ## eg. : provide a custom java keystore used in jvmArgs
  ##
  ## keytool -genkeypair -storetype jks -alias todelete -keypass changeit -storepass changeit -keystore custom-pki.jks -dname "CN=Developer, OU=Department, O=Company, L=City, ST=State, C=CA"
  ## keytool -delete -alias todelete -storepass changeit -keystore custom-pki.jks
  ## keytool -list -keystore custom-pki.jks -storepass changeit
  ##
  ## Generate base64 data and paste it in your values.yaml :
  ## cat custom-pki.jks | base64 -w
  secret:
    enabled: false
    mountPath: /etc/secret-volume
    readOnly: true
    data: {}
    #  custom-pki.jks: <base64-data>
  service:
    portName: ""

serviceui:
  repository: reportportal/service-ui
  tag: 5.7.2
  name: ui
  pullPolicy: Always
  resources:
    requests:
      cpu: 100m
      memory: 64Mi
    limits:
      cpu: 200m
      memory: 128Mi
  podAnnotations: {}
  securityContext: {}
  ## Define which Nodes the Pods are scheduled on.
  ## ref: https://kubernetes.io/docs/user-guide/node-selection/
  ##
  nodeSelector: {}
  #  disktype: ssd
  serviceAccountName: ""
  service:
    portName: ""

serviceapi:
  repository: reportportal/service-api
  tag: 5.7.2
  name: api
  pullPolicy: Always
  replicaCount: 1
  readinessProbe:
    initialDelaySeconds: 30
    periodSeconds: 20
    timeoutSeconds: 3
    failureThreshold: 20
  resources:
    requests:
      cpu: 500m
      memory: 1024Mi
    limits:
      cpu: 1000m
      memory: 2048Mi
  jvmArgs: "-Djava.security.egd=file:/dev/./urandom -XX:+UseG1GC -XX:MinRAMPercentage=60.0 -XX:InitiatingHeapOccupancyPercent=70 -XX:MaxRAMPercentage=90.0 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp"
  ## Number of queues
  ## Where "totalNumber" is the total number of queues
  ## Сalculation formula: perPodNumber = totalNumber / serviceapi.replicaCount
  queues:
    totalNumber:
    perPodNumber:
  ## External environment variables
  ##
  extraEnvs: []
    # - name: EXTRA_ENV
    #   value: "TRUE"
    # - name: EXTRA_ENV_SECRET
    #   valueFrom:
    #     secretKeyRef:
    #       name: "additional-credentials"
    #       key: username
  podAnnotations: {}
  securityContext: {}
  ## Define which Nodes the Pods are scheduled on.
  ## ref: https://kubernetes.io/docs/user-guide/node-selection/
  ##
  nodeSelector: {}
  #  disktype: ssd
  serviceAccountName: ""
  service:
    portName: ""

servicejobs:
  repository: reportportal/service-jobs
  tag: 5.7.2
  name: jobs
  pullPolicy: Always
  coreJobs:
    cleanAttachmentCron: 0 0 */24 * * *
    cleanLogCron: 0 0 */24 * * *
    cleanLaunchCron: 0 0 */24 * * *
    cleanStorageCron: 0 0 */24 * * *
    storageProjectCron: 0 */5 * * * *
  chunksize: 1000
  resources:
    requests:
      cpu: 100m
      memory: 248Mi
    limits:
      cpu: 100m
      memory: 372Mi
  jvmArgs: ""
  ## External environment variables
  ##
  extraEnvs: []
    # - name: EXTRA_ENV
    #   value: "TRUE"
    # - name: EXTRA_ENV_SECRET
    #   valueFrom:
    #     secretKeyRef:
    #       name: "additional-credentials"
    #       key: username
  podAnnotations: {}
  securityContext: {}
  ## Define which Nodes the Pods are scheduled on.
  ## ref: https://kubernetes.io/docs/user-guide/node-selection/
  ##
  nodeSelector: {}
  #  disktype: ssd
  serviceAccountName: ""
  service:
    portName: ""

migrations:
  repository: reportportal/migrations
  tag: 5.7.0
  pullPolicy: Always
  resources:
    requests:
      cpu: 100m
      memory: 64Mi
    limits:
      cpu: 100m
      memory: 128Mi
  podAnnotations: {}
  securityContext: {}
  ## Define which Nodes the Pods are scheduled on.
  ## ref: https://kubernetes.io/docs/user-guide/node-selection/
  ##
  nodeSelector: {}
  #  disktype: ssd
  serviceAccountName: ""
  metadataAnnotations:
    enabled: true
    hooks:
      "helm.sh/hook": "pre-install,pre-upgrade"
      "helm.sh/hook-delete-policy": "before-hook-creation,hook-succeeded"

serviceanalyzer:
  repository: reportportal/service-auto-analyzer
  tag: 5.7.2
  name: analyzer
  pullPolicy: Always
  uwsgiWorkers: 2
  resources:
    requests:
      cpu: 100m
      memory: 256Mi
    limits:
      cpu: 100m
      memory: 512Mi
  podAnnotations: {}
  securityContext: {}
  ## External environment variables
  ##
  extraEnvs: []
    # - name: EXTRA_ENV
    #   value: "TRUE"
    # - name: EXTRA_ENV_SECRET
    #   valueFrom:
    #     secretKeyRef:
    #       name: "additional-credentials"
    #       key: username
  ##
  ## Define which Nodes the Pods are scheduled on.
  ## ref: https://kubernetes.io/docs/user-guide/node-selection/
  ##
  nodeSelector: {}
  #  disktype: ssd
  serviceAccountName: ""
  service:
    portName: ""

serviceanalyzertrain:
  resources:
    requests:
      cpu: 100m
      memory: 256Mi
    limits:
      cpu: 200m
      memory: 512Mi
  podAnnotations: {}
  securityContext: {}
  ## External environment variables
  ##
  extraEnvs: []
    # - name: EXTRA_ENV
    #   value: "TRUE"
    # - name: EXTRA_ENV_SECRET
    #   valueFrom:
    #     secretKeyRef:
    #       name: "additional-credentials"
    #       key: username
  ## Define which Nodes the Pods are scheduled on.
  ## ref: https://kubernetes.io/docs/user-guide/node-selection/
  ##
  nodeSelector: {}
  #  disktype: ssd
  serviceAccountName: ""
  service:
    portName: ""

metricsgatherer:
  repository: reportportal/service-metrics-gatherer
  tag: 1.1.20
  name: metrics-gatherer
  loggingLevel: debug
  timeManagement:
    starttime: 22:00
    endtime: 08:00
    timezone: Europe/Minsk
  maxDaysStore: 500
  resources:
    requests:
      cpu: 8m
      memory: 128Mi
    limits:
      cpu: 16m
      memory: 256Mi
  podAnnotations: {}
  securityContext: {}
  ## External environment variables
  ##
  extraEnvs: []
    # - name: EXTRA_ENV
    #   value: "TRUE"
    # - name: EXTRA_ENV_SECRET
    #   valueFrom:
    #     secretKeyRef:
    #       name: "additional-credentials"
    #       key: username
  ## Define which Nodes the Pods are scheduled on.
  ## ref: https://kubernetes.io/docs/user-guide/node-selection/
  ##
  nodeSelector: {}
  #  disktype: ssd
  serviceAccountName: ""
  service:
    portName: ""

rabbitmq:
  SecretName: ""
  installdep:
    enable: false
  endpoint:
    address: rabbitmq-local.default.svc.cluster.local
    port: 5672
    user: rabbitmq
    apiport: 15672
    apiuser: rabbitmq
    password: rabbitmq
  ## Virtual hosts provide logical grouping and separation of resources.
  ## ref: https://www.rabbitmq.com/vhosts.html
  vhost: analyzer

postgresql:
  SecretName: ""
  installdep:
    enable: false
  endpoint:
    address: postgres-postgresql.default.svc.cluster.local
    port: 5432
    user: rpuser
    dbName: reportportal
    ## Whether or not to use SSL
    ssl: disable
    # ssl: disable / require
    password: rppass
    ## Number of database connections
    connections:

elasticsearch:
  secretName: ""
  installdep:
    enable: false
  endpoint: http://elasticsearch-master.default.svc.cluster.local:9200
  user: elastic
  password:

minio:
  secretName: ""
  enabled: true
  installdep:
    enable: false
  endpoint: http://minio-local.default.svc.cluster.local:9000
  endpointshort: minio-local-minio.default.svc.cluster.local:9000
  region: ""
  accesskey: minio
  secretkey: minio123
  accesskeyName: "access-key"
  secretkeyName: "secret-key"
  bucketPrefix: ""
  defaultBucketName: ""
  integrationSaltPath: ""

# Ingress configuration for the ui
# If you have installed ingress controller and want to expose application - set INGRESS.ENABLE to true.
# If you have some domain name set INGRESS.USEDOMAINNAME variable to true and set this fqdn to INGRESS.HOSTS
# If you don't have any domain names - set INGRESS.USEDOMAINNAME to false
ingress:
  enable: true
  # IF YOU HAVE SOME DOMAIN NAME SET INGRESS.USEDOMAINNAME to true
  usedomainname: false
  hosts:
    - 10.102.170.96
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/ssl-redirect: "false"
    nginx.ingress.kubernetes.io/rewrite-target: /$2
    nginx.ingress.kubernetes.io/x-forwarded-prefix: /$1
    nginx.ingress.kubernetes.io/proxy-body-size: 128m
    nginx.ingress.kubernetes.io/proxy-buffer-size: 512k
    nginx.ingress.kubernetes.io/proxy-buffers-number: "4"
    nginx.ingress.kubernetes.io/proxy-busy-buffers-size: 512k
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "8000"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "4000"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "4000"
  tls: []
  # - hosts:
  #   - reportportal.k8.com
  #   secretName: reportportal.k8.com-tls

# tolerations for all components, if any (requires Kubernetes >= 1.6)
## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
tolerations: []
  # - key: "key"
  #   operator: "Equal|Exists"
  #   value: "value"
  #   effect: "NoSchedule|PreferNoSchedule|NoExecute"

# RBAC is required for service-index in order to collect status/info over all services
rbac:
  create: true
  serviceAccount:
    create: true
    name: reportportal
    # For AWS IAM role association use the following annotations
    ## See: https://docs.aws.amazon.com/eks/latest/userguide/specify-service-account-role.html
    annotations: {}

rp:
  infoEndpoint: "/info"
  healthEndpoint: "/health"

# Extra init containers to e.g. wait for minio
extraInitContainers: {}
  # - name: "wait-for-minio"
  #   image: "busybox"
  #   imagePullPolicy: "IfNotPresent"
  #   command:
  #     - sh
  #     - "-c"
  #     - "for i in `seq 1 300`; do sleep 1; if wget http://<minio-release-name>-minio.default.svc.cluster.local:9000/minio/health/live -q -O /dev/null ; then exit 0; fi; done; exit 1"

I have even tried to increase the 'initialDelaySeconds' in readinessProbe but it did not help.

@ci-operator
Copy link

Check the logs of your "jobs" pod.

@manojguptavonage
Copy link
Author

manojguptavonage commented Jan 13, 2023

There are no logs for this pod, looks like container is not running. Below is the description.


Name:             report-reportportal-jobs-fffdcb684-w6dzk
Namespace:        default
Priority:         0
Service Account:  default
Node:             minikube/192.168.49.2
Start Time:       Fri, 13 Jan 2023 16:13:29 +0000
Labels:           component=report-reportportal-jobs
                  pod-template-hash=fffdcb684
Annotations:      <none>
Status:           Running
IP:               172.17.0.9
IPs:
  IP:           172.17.0.9
Controlled By:  ReplicaSet/report-reportportal-jobs-fffdcb684
Containers:
  jobs:
    Container ID:   docker://e41ee59ae89917c3d1cee3f3be7122a3842a8c16c3f6f5c76ccc3eb72015403f
    Image:          reportportal/service-jobs:5.7.3
    Image ID:       docker-pullable://reportportal/service-jobs@sha256:97d4fa14be580f6047a93a57c9f144030b4ac4c28557f2d8b27ec40cd14791f0
    Port:           8686/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Fri, 13 Jan 2023 16:13:52 +0000
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     100m
      memory:  372Mi
    Requests:
      cpu:      100m
      memory:   248Mi
    Liveness:   http-get http://:8686/health delay=60s timeout=5s period=40s #success=1 #failure=10
    Readiness:  http-get http://:8686/health delay=120s timeout=20s period=80s #success=1 #failure=20
    Environment:
      RP_ENVIRONMENT_VARIABLE_CLEAN_ATTACHMENT_CRON:    0 0 */24 * * *
      RP_ENVIRONMENT_VARIABLE_CLEAN_LOG_CRON:           0 0 */24 * * *
      RP_ENVIRONMENT_VARIABLE_CLEAN_LAUNCH_CRON:        0 0 */24 * * *
      RP_ENVIRONMENT_VARIABLE_CLEAN_STORAGE_CRON:       0 0 */24 * * *
      RP_ENVIRONMENT_VARIABLE_STORAGE_PROJECT_CRON:     0 */5 * * * *
      RP_ENVIRONMENT_VARIABLE_CLEAN_STORAGE_CHUNKSIZE:  1000
      RP_AMQP_ANALYZER-VHOST:                           analyzer
      RP_AMQP_PASS:                                     rabbitmq
      RP_AMQP_API_ADDRESS:                              http://rabbitmq:$(RP_AMQP_PASS)@rabbitmq-local-rabbitmq.default.svc.cluster.local:15672/api
      RP_AMQP_ADDRESSES:                                amqp://rabbitmq:$(RP_AMQP_PASS)@rabbitmq-local-rabbitmq.default.svc.cluster.local:5672
      RP_DB_HOST:                                       postgres-postgresql.default.svc.cluster.local
      RP_DB_PORT:                                       5432
      RP_DB_NAME:                                       reportportal
      RP_DB_USER:                                       rpuser
      RP_DB_PASS:                                       rppass
      DATASTORE_TYPE:                                   minio
      DATASTORE_MINIO_ENDPOINT:                         http://minio-local-minio.default.svc.cluster.local:9000
      DATASTORE_MINIO_ACCESSKEY:                        minio
      DATASTORE_MINIO_SECRETKEY:                        minio123
      RP_PROCESSING_LOG_MAXBATCHSIZE:                   2000
      RP_PROCESSING_LOG_MAXBATCHTIMEOUT:                6000
      RP_AMQP_MAXLOGCONSUMER:                           1
      RP_ELASTICSEARCH_HOST:                            http://elasticsearch-master.default.svc.cluster.local:9200
      RP_ELASTICSEARCH_USERNAME:                        elastic
      RP_ELASTICSEARCH_PASSWORD:                        
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8787p (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-8787p:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  8m                   default-scheduler  Successfully assigned default/report-reportportal-jobs-fffdcb684-w6dzk to minikube
  Normal   Pulling    7m59s                kubelet            Pulling image "reportportal/service-jobs:5.7.3"
  Normal   Pulled     7m37s                kubelet            Successfully pulled image "reportportal/service-jobs:5.7.3" in 22.082734802s
  Normal   Created    7m37s                kubelet            Created container jobs
  Normal   Started    7m37s                kubelet            Started container jobs
  Warning  Unhealthy  80s (x4 over 5m20s)  kubelet            Readiness probe failed: Get "http://172.17.0.9:8686/health": dial tcp 172.17.0.9:8686: connect: connection refused
  Warning  Unhealthy  40s (x9 over 6m)     kubelet            Liveness probe failed: Get "http://172.17.0.9:8686/health": dial tcp 172.17.0.9:8686: connect: connection refused```

@manojguptavonage
Copy link
Author

Here are logs from rabbitmq, if these are related:

  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  17m   default-scheduler  0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
  Normal   Scheduled         17m   default-scheduler  Successfully assigned default/rabbitmq-local-0 to minikube
  Normal   Pulling           17m   kubelet            Pulling image "docker.io/bitnami/rabbitmq:3.8.5-debian-10-r38"
  Normal   Pulled            13m   kubelet            Successfully pulled image "docker.io/bitnami/rabbitmq:3.8.5-debian-10-r38" in 3m30.852012846s
  Normal   Created           13m   kubelet            Created container rabbitmq
  Normal   Started           13m   kubelet            Started container rabbitmq
  Warning  Unhealthy         13m   kubelet            Readiness probe failed: Error: unable to perform an operation on node 'rabbit@rabbitmq-local-0.rabbitmq-local-headless.default.svc.cluster.local'. Please see diagnostics information and suggestions below.

Most common reasons for this are:

 * Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)
 * CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)
 * Target node is not running

In addition to the diagnostics info below:

 * See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more
 * Consult server logs on node rabbit@rabbitmq-local-0.rabbitmq-local-headless.default.svc.cluster.local
 * If target node is configured to use long node names, don't forget to use --longnames with CLI tools

DIAGNOSTICS
===========

attempted to contact: ['rabbit@rabbitmq-local-0.rabbitmq-local-headless.default.svc.cluster.local']

rabbit@rabbitmq-local-0.rabbitmq-local-headless.default.svc.cluster.local:
  * connected to epmd (port 4369) on rabbitmq-local-0.rabbitmq-local-headless.default.svc.cluster.local
  * epmd reports: node 'rabbit' not running at all
                  no other nodes on rabbitmq-local-0.rabbitmq-local-headless.default.svc.cluster.local
  * suggestion: start the node

Current node details:
 * node name: 'rabbitmqcli-661-rabbit@rabbitmq-local-0.rabbitmq-local-headless.default.svc.cluster.local'
 * effective user's home directory: /opt/bitnami/rabbitmq/.rabbitmq
 * Erlang cookie hash: 7gnpXKVpgoGuHQl+OyBdng==```

@manojguptavonage
Copy link
Author

DzmitryHumianiuk Could you please help here?

@manojguptavonage
Copy link
Author

kubectl logs -f report-reportportal-jobs-c448cc65-lqvm5


  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/
 :: Spring Boot ::               (v2.5.14)

2023-01-13 17:40:48.200  INFO 1 --- [           main] c.e.reportportal.ServiceJobApplication   : Starting ServiceJobApplication using Java 11.0.17 on report-reportportal-jobs-c448cc65-lqvm5 with PID 1 (/service-jobs-5.7.3-exec.jar started by root in /)
2023-01-13 17:40:48.899  INFO 1 --- [           main] c.e.reportportal.ServiceJobApplication   : No active profile set, falling back to 1 default profile: "default"
2023-01-13 17:45:28.004  INFO 1 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat initialized with port(s): 8686 (http)
2023-01-13 17:45:30.203  INFO 1 --- [           main] o.apache.catalina.core.StandardService   : Starting service [Tomcat]
2023-01-13 17:45:30.212  INFO 1 --- [           main] org.apache.catalina.core.StandardEngine  : Starting Servlet engine: [Apache Tomcat/9.0.63]
2023-01-13 17:45:44.493  INFO 1 --- [           main] o.a.c.c.C.[Tomcat].[localhost].[/]       : Initializing Spring embedded WebApplicationContext
2023-01-13 17:45:44.687  INFO 1 --- [           main] w.s.c.ServletWebServerApplicationContext : Root WebApplicationContext: initialization completed in 286797 ms```

@manojguptavonage
Copy link
Author

@HardNorth Can you please check this issue? Jobs, UAT and API pods are not getting ready.

I am stuck here from last 2 weeks

@manojguptavonage
Copy link
Author


  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/
 :: Spring Boot ::               (v2.5.14)

2023-01-20 17:22:59.256  INFO 1 --- [           main] c.e.reportportal.ServiceJobApplication   : Starting ServiceJobApplication using Java 11.0.17 on report-reportportal-jobs-868f78bf95-8d49z with PID 1 (/service-jobs-5.7.3-exec.jar started by root in /)
2023-01-20 17:23:00.049  INFO 1 --- [           main] c.e.reportportal.ServiceJobApplication   : No active profile set, falling back to 1 default profile: "default"
2023-01-20 17:28:10.247  INFO 1 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat initialized with port(s): 8686 (http)
2023-01-20 17:28:12.543  INFO 1 --- [           main] o.apache.catalina.core.StandardService   : Starting service [Tomcat]
2023-01-20 17:28:12.544  INFO 1 --- [           main] org.apache.catalina.core.StandardEngine  : Starting Servlet engine: [Apache Tomcat/9.0.63]
2023-01-20 17:28:28.749  INFO 1 --- [           main] o.a.c.c.C.[Tomcat].[localhost].[/]       : Initializing Spring embedded WebApplicationContext
2023-01-20 17:28:28.844  INFO 1 --- [           main] w.s.c.ServletWebServerApplicationContext : Root WebApplicationContext: initialization completed in 317298 ms
2023-01-20 17:30:26.250  INFO 1 --- [           main] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Starting...
2023-01-20 17:30:40.748  INFO 1 --- [           main] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Start completed.
2023-01-20 17:34:40.846  INFO 1 --- [           main] o.s.b.a.e.web.EndpointLinksResolver      : Exposing 2 endpoint(s) beneath base path ''
2023-01-20 17:34:55.948  INFO 1 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port(s): 8686 (http) with context path ''
2023-01-20 17:34:56.248  INFO 1 --- [           main] o.s.a.r.c.CachingConnectionFactory       : Attempting to connect to: rabbitmq-local.default.svc.cluster.local:5672
2023-01-20 17:35:02.947  INFO 1 --- [           main] o.s.a.r.l.SimpleMessageListenerContainer : Broker not available; cannot force queue declarations during start: java.util.concurrent.TimeoutException
2023-01-20 17:35:02.951  WARN 1 --- [103.32.165:5672] c.r.c.impl.ForgivingExceptionHandler     : An unexpected connection driver error occured (Exception message: Socket closed)
2023-01-20 17:35:05.046  INFO 1 --- [ntContainer#0-1] o.s.a.r.c.CachingConnectionFactory       : Attempting to connect to: rabbitmq-local.default.svc.cluster.local:5672
2023-01-20 17:35:06.644  INFO 1 --- [ntContainer#0-1] o.s.a.r.c.CachingConnectionFactory       : Created new connection: processingConnectionFactory#44841b43:1/SimpleConnection@650d719c [delegate=amqp://[email protected]:5672/, localPort= 47624]
2023-01-20 17:35:18.448  INFO 1 --- [           main] c.e.reportportal.ServiceJobApplication   : Started ServiceJobApplication in 852.992 seconds (JVM running for 943.98)
2023-01-20 17:35:41.448  INFO 1 --- [nio-8686-exec-1] o.a.c.c.C.[Tomcat].[localhost].[/]       : Initializing Spring DispatcherServlet 'dispatcherServlet'
2023-01-20 17:35:41.448  INFO 1 --- [nio-8686-exec-1] o.s.web.servlet.DispatcherServlet        : Initializing Servlet 'dispatcherServlet'
2023-01-20 17:35:41.846  INFO 1 --- [nio-8686-exec-1] o.s.web.servlet.DispatcherServlet        : Completed initialization in 397 ms
2023-01-20 17:35:56.548  INFO 1 --- [nio-8686-exec-1] o.s.a.r.c.CachingConnectionFactory       : Attempting to connect to: rabbitmq-local.default.svc.cluster.local:5672
2023-01-20 17:35:58.347  INFO 1 --- [nio-8686-exec-1] o.s.a.r.c.CachingConnectionFactory       : Created new connection: analyzerConnectionFactory#78226c36:0/SimpleConnection@6f02d3f0 [delegate=amqp://[email protected]:5672/analyzer, localPort= 48056]
2023-01-20 17:36:24.245  INFO 1 --- [ntContainer#0-2] o.s.a.r.l.SimpleMessageListenerContainer : Waiting for workers to finish.
2023-01-20 17:36:25.145  INFO 1 --- [ntContainer#0-2] o.s.a.r.l.SimpleMessageListenerContainer : Successfully waited for workers to finish.
2023-01-20 17:36:40.149  INFO 1 --- [ionShutdownHook] o.s.a.r.l.SimpleMessageListenerContainer : Shutdown ignored - container is already stopped
2023-01-20 17:36:40.846  INFO 1 --- [ionShutdownHook] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Shutdown initiated...
2023-01-20 17:36:41.746  INFO 1 --- [ionShutdownHook] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Shutdown completed.```

@manojguptavonage
Copy link
Author

Name:             report-reportportal-jobs-868f78bf95-8d49z
Namespace:        default
Priority:         0
Service Account:  default
Node:             minikube/192.168.49.2
Start Time:       Fri, 20 Jan 2023 17:19:31 +0000
Labels:           component=report-reportportal-jobs
                  pod-template-hash=868f78bf95
Annotations:      <none>
Status:           Running
IP:               172.17.0.7
IPs:
  IP:           172.17.0.7
Controlled By:  ReplicaSet/report-reportportal-jobs-868f78bf95
Containers:
  jobs:
    Container ID:   docker://78d5f09543b360960b3b454fae588a4afc93b5daa059f1c7a5b3a2280cb10f37
    Image:          reportportal/service-jobs:5.7.3
    Image ID:       docker-pullable://reportportal/service-jobs@sha256:97d4fa14be580f6047a93a57c9f144030b4ac4c28557f2d8b27ec40cd14791f0
    Port:           8686/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Fri, 20 Jan 2023 17:36:43 +0000
    Last State:     Terminated
      Reason:       Error
      Exit Code:    143
      Started:      Fri, 20 Jan 2023 17:19:33 +0000
      Finished:     Fri, 20 Jan 2023 17:36:42 +0000
    Ready:          False
    Restart Count:  1
    Limits:
      cpu:     100m
      memory:  372Mi
    Requests:
      cpu:      100m
      memory:   248Mi
    Liveness:   http-get http://:8686/health delay=600s timeout=5s period=40s #success=1 #failure=10
    Readiness:  http-get http://:8686/health delay=600s timeout=5s period=40s #success=1 #failure=10
    Environment:
      RP_ENVIRONMENT_VARIABLE_CLEAN_ATTACHMENT_CRON:    0 0 */24 * * *
      RP_ENVIRONMENT_VARIABLE_CLEAN_LOG_CRON:           0 0 */24 * * *
      RP_ENVIRONMENT_VARIABLE_CLEAN_LAUNCH_CRON:        0 0 */24 * * *
      RP_ENVIRONMENT_VARIABLE_CLEAN_STORAGE_CRON:       0 0 */24 * * *
      RP_ENVIRONMENT_VARIABLE_STORAGE_PROJECT_CRON:     0 */5 * * * *
      RP_ENVIRONMENT_VARIABLE_CLEAN_STORAGE_CHUNKSIZE:  1000
      RP_AMQP_ANALYZER-VHOST:                           analyzer
      RP_AMQP_PASS:                                     <set to the key 'rabbitmq-password' in secret 'rabbitmq-local'>  Optional: false
      RP_AMQP_API_ADDRESS:                              http://rabbitmq:$(RP_AMQP_PASS)@rabbitmq-local.default.svc.cluster.local:15672/api
      RP_AMQP_ADDRESSES:                                amqp://rabbitmq:$(RP_AMQP_PASS)@rabbitmq-local.default.svc.cluster.local:5672
      RP_DB_HOST:                                       postgres-postgresql.default.svc.cluster.local
      RP_DB_PORT:                                       5432
      RP_DB_NAME:                                       reportportal
      RP_DB_USER:                                       rpuser
      RP_DB_PASS:                                       <set to the key 'postgresql-password' in secret 'postgres-postgresql'>  Optional: false
      DATASTORE_TYPE:                                   minio
      DATASTORE_MINIO_ENDPOINT:                         http://minio-local.default.svc.cluster.local:9000
      DATASTORE_MINIO_ACCESSKEY:                        <set to the key 'access-key' in secret 'minio-local'>  Optional: false
      DATASTORE_MINIO_SECRETKEY:                        <set to the key 'secret-key' in secret 'minio-local'>  Optional: false
      RP_PROCESSING_LOG_MAXBATCHSIZE:                   2000
      RP_PROCESSING_LOG_MAXBATCHTIMEOUT:                6000
      RP_AMQP_MAXLOGCONSUMER:                           1
      RP_ELASTICSEARCH_HOST:                            http://elasticsearch-master.default.svc.cluster.local:9200
      RP_ELASTICSEARCH_USERNAME:                        elastic
      RP_ELASTICSEARCH_PASSWORD:                        
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-wsthb (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-wsthb:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  30m                  default-scheduler  Successfully assigned default/report-reportportal-jobs-868f78bf95-8d49z to minikube
  Warning  Failed     30m                  kubelet            Error: failed to sync secret cache: timed out waiting for the condition
  Normal   Started    30m                  kubelet            Started container jobs
  Warning  Unhealthy  13m (x2 over 14m)    kubelet            Liveness probe failed: Get "http://172.17.0.7:8686/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  13m (x2 over 14m)    kubelet            Readiness probe failed: Get "http://172.17.0.7:8686/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Normal   Killing    13m                  kubelet            Container jobs failed liveness probe, will be restarted
  Normal   Pulled     13m (x3 over 30m)    kubelet            Container image "reportportal/service-jobs:5.7.3" already present on machine
  Normal   Created    13m (x2 over 30m)    kubelet            Created container jobs
  Warning  Unhealthy  3m3s (x9 over 19m)   kubelet            Liveness probe failed: Get "http://172.17.0.7:8686/health": dial tcp 172.17.0.7:8686: connect: connection refused
  Warning  Unhealthy  3m3s (x10 over 19m)  kubelet            Readiness probe failed: Get "http://172.17.0.7:8686/health": dial tcp 172.17.0.7:8686: connect: connection refused```

@manojguptavonage
Copy link
Author

manojguptavonage commented Jan 23, 2023


  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/
 :: Spring Boot ::               (v2.5.14)

2023-01-23 14:10:17.268  INFO 1 --- [           main] c.e.reportportal.ServiceJobApplication   : Starting ServiceJobApplication using Java 11.0.17 on report-reportportal-jobs-868f78bf95-5454c with PID 1 (/service-jobs-5.7.3-exec.jar started by root in /)
2023-01-23 14:10:17.866  INFO 1 --- [           main] c.e.reportportal.ServiceJobApplication   : No active profile set, falling back to 1 default profile: "default"
2023-01-23 14:15:36.665  INFO 1 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat initialized with port(s): 8686 (http)
2023-01-23 14:15:38.668  INFO 1 --- [           main] o.apache.catalina.core.StandardService   : Starting service [Tomcat]
2023-01-23 14:15:38.764  INFO 1 --- [           main] org.apache.catalina.core.StandardEngine  : Starting Servlet engine: [Apache Tomcat/9.0.63]
2023-01-23 14:15:52.268  INFO 1 --- [           main] o.a.c.c.C.[Tomcat].[localhost].[/]       : Initializing Spring embedded WebApplicationContext
2023-01-23 14:15:52.465  INFO 1 --- [           main] w.s.c.ServletWebServerApplicationContext : Root WebApplicationContext: initialization completed in 327196 ms
2023-01-23 14:17:52.650  INFO 1 --- [           main] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Starting...
2023-01-23 14:18:09.646  INFO 1 --- [           main] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Start completed.
2023-01-23 14:22:24.644  INFO 1 --- [           main] o.s.b.a.e.web.EndpointLinksResolver      : Exposing 2 endpoint(s) beneath base path ''
2023-01-23 14:22:41.347  INFO 1 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port(s): 8686 (http) with context path ''
2023-01-23 14:22:41.649  INFO 1 --- [           main] o.s.a.r.c.CachingConnectionFactory       : Attempting to connect to: rabbitmq-local.default.svc.cluster.local:5672
2023-01-23 14:22:49.651  INFO 1 --- [           main] o.s.a.r.l.SimpleMessageListenerContainer : Broker not available; cannot force queue declarations during start: java.util.concurrent.TimeoutException
2023-01-23 14:22:51.648  WARN 1 --- [103.32.165:5672] c.r.c.impl.ForgivingExceptionHandler     : An unexpected connection driver error occured (Exception message: Socket closed)
2023-01-23 14:22:55.146  INFO 1 --- [ntContainer#0-1] o.s.a.r.c.CachingConnectionFactory       : Attempting to connect to: rabbitmq-local.default.svc.cluster.local:5672
2023-01-23 14:23:00.547  INFO 1 --- [ntContainer#0-1] o.s.a.r.c.CachingConnectionFactory       : Created new connection: processingConnectionFactory#f4c0e4e:1/SimpleConnection@1c1e4807 [delegate=amqp://[email protected]:5672/, localPort= 33454]
2023-01-23 14:23:12.650  INFO 1 --- [nio-8686-exec-2] o.s.web.servlet.DispatcherServlet        : Initializing Servlet 'dispatcherServlet'
2023-01-23 14:23:12.949  INFO 1 --- [nio-8686-exec-2] o.s.web.servlet.DispatcherServlet        : Completed initialization in 201 ms
2023-01-23 14:23:18.546  INFO 1 --- [           main] c.e.reportportal.ServiceJobApplication   : Started ServiceJobApplication in 893.602 seconds (JVM running for 988.991)
2023-01-23 14:23:22.449  INFO 1 --- [ntContainer#0-2] o.s.a.r.l.SimpleMessageListenerContainer : Waiting for workers to finish.
2023-01-23 14:23:22.450  INFO 1 --- [ntContainer#0-2] o.s.a.r.l.SimpleMessageListenerContainer : Successfully waited for workers to finish.```

Job pod is restarted after this

@manojguptavonage
Copy link
Author

@ci-operator can you please check this?

@aquadrehz
Copy link

@manojguptavonage
It's seems the same issue I'm facing
Did you are executing locally in Mac M1 with minikube?

@aquadrehz
Copy link

I think I got some lead about this
I suspect this caused from docker picked the wrong image architecture.

I try check running container's architect using this command for this linux/arm64 image for reportportal/service-api/5.10.0

docker run -it --name service-api reportportal/service-api@sha256:0b6ede0320738b9732e798eeaf7583ad4feb8f9db6497309a8addbe9c08f7883 uname -m

It would probably failed and was shutting down soon.
So execute this command awhile container starting

docker exec -ti -u 0 service-api uname -m 
# This container identified as amd64
x86_64

I also check the manifest which seems fine using this command

docker manifest inspect reportportal/service-api:5.10.0

{
   "schemaVersion": 2,
   "mediaType": "application/vnd.oci.image.index.v1+json",
   "manifests": [
      {
         "mediaType": "application/vnd.oci.image.manifest.v1+json",
         "size": 1058,
         "digest": "sha256:0fd2b28c7faf2e80ff0e30e1e5ec00a812a0b39595f51013d2b69de878f48a50",
         "platform": {
            "architecture": "amd64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.oci.image.manifest.v1+json",
         "size": 1058,
         "digest": "sha256:0b6ede0320738b9732e798eeaf7583ad4feb8f9db6497309a8addbe9c08f7883",
         "platform": {
            "architecture": "arm64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.oci.image.manifest.v1+json",
         "size": 567,
         "digest": "sha256:7d547c580a3e3ba3905e9770dfce9b3d6fdb2f2c3a134ada80eeb0f7d12ef4fa",
         "platform": {
            "architecture": "unknown",
            "os": "unknown"
         }
      },
      {
         "mediaType": "application/vnd.oci.image.manifest.v1+json",
         "size": 567,
         "digest": "sha256:52f83233aa942d8592da73a0f47cb591ac339749ef870ebf6da7e33141d30e28",
         "platform": {
            "architecture": "unknown",
            "os": "unknown"
         }
      }
   ]

}

The manifest also seems correct.
Did you any idea why do it picked amd64 while I'm executing both Colima and minikube as arc64 in Mac M1?
@raikbitters
@DzmitryHumianiuk

@DzmitryHumianiuk
Copy link
Member

DzmitryHumianiuk commented Nov 13, 2023

probably this goes to @raikbitters more then to me.

@aquadrehz just wondering, are there any issue on Colima side for this? since AFAIK it picks the right image with docker desktop.

@aquadrehz
Copy link

I see. I would wait for @raikbitters

But I tried with another multi-arch image, alpine. and it picked the correct one, aarch64
It's really neat info, but I'm not sure it's exactly the Colima issue.

docker manifest inspect alpine
{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
   "manifests": [
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 528,
         "digest": "sha256:48d9183eb12a05c99bcc0bf44a003607b8e941e1d4f41f9ad12bdcc4b5672f86",
         "platform": {
            "architecture": "amd64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 528,
         "digest": "sha256:777e2106170c66742ddbe77f703badb7dc94d9a5b1dc2c4a01538fad9aef56bb",
         "platform": {
            "architecture": "arm",
            "os": "linux",
            "variant": "v6"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 528,
         "digest": "sha256:9909a171ac287c316f771a4f1d1384df96957ed772bc39caf6deb6e3e360316f",
         "platform": {
            "architecture": "arm",
            "os": "linux",
            "variant": "v7"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 528,
         "digest": "sha256:6ce9a9a256a3495ae60ab0059ed1c7aee5ee89450477f2223f6ea7f6296df555",
         "platform": {
            "architecture": "arm64",
            "os": "linux",
            "variant": "v8"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 528,
         "digest": "sha256:664888ac9cfd28068e062c991ebcff4b4c7307dc8dd4df9e728bedde5c449d91",
         "platform": {
            "architecture": "386",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 528,
         "digest": "sha256:39ac038d955f9d66f98b998aa52843493c03f0373888a988f25e5ed039949aff",
         "platform": {
            "architecture": "ppc64le",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 528,
         "digest": "sha256:4bfee8cb4c809bcb21aa730ce26a2302d3888002ff2d70ec5fd569e7ffdc1a7d",
         "platform": {
            "architecture": "s390x",
            "os": "linux"
         }
      }
   ]
}
docker images
REPOSITORY                    TAG       IMAGE ID       CREATED        SIZE
reportportal/service-api      <none>    e870b6e138d8   6 weeks ago    618MB
alpine                        latest    3cc203321400   6 weeks ago    7.66MB
gcr.io/k8s-minikube/kicbase   v0.0.40   f52519afe5f6   4 months ago   1.1GB
docker run -it alpine uname -m

aarch64

Anyway, it would be nice if you have additional suggestion to improve my approach, @DzmitryHumianiuk

@raikbitters
Copy link
Contributor

raikbitters commented Nov 14, 2023

@aquadrehz hi. Have you tried this solution: docker pull --platform linux/arm64 ?
You can build a docker image from the source on your machine and test it.

Can a Virtual machine emulate the AMD architecture on MacOS for Docker?

@aquadrehz
Copy link

@raikbitters
Pull and build locally already but still got the same result when inspecting the manifest.
I expect to got only 1 arm image for arm64
But I got 4 sub image (amd64 included)

docker manifest inspect reportportal/service-api

{
   "schemaVersion": 2,
   "mediaType": "application/vnd.oci.image.index.v1+json",
   "manifests": [
      {
         "mediaType": "application/vnd.oci.image.manifest.v1+json",
         "size": 1058,
         "digest": "sha256:6e79a86316f9622cfb0c7a541cc5d0b4c8de90139471a8a6f54bd298468baed6",
         "platform": {
            "architecture": "amd64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.oci.image.manifest.v1+json",
         "size": 1058,
         "digest": "sha256:61cd7e04fd9f1c1a65d45be7950c23431fee30cc85048bdfe7b14d191c6d6884",
         "platform": {
            "architecture": "arm64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.oci.image.manifest.v1+json",
         "size": 567,
         "digest": "sha256:b70a8f7e2d55a0342ad186efc72f8a64dbc5e52b31049a8e3da5f0fef68af77b",
         "platform": {
            "architecture": "unknown",
            "os": "unknown"
         }
      },
      {
         "mediaType": "application/vnd.oci.image.manifest.v1+json",
         "size": 567,
         "digest": "sha256:b7d3004291bb9e8278720d87808f2270909e8d88676d53fa31c0f50eba64726e",
         "platform": {
            "architecture": "unknown",
            "os": "unknown"
         }
      }
   ]
}

It can emulate both AMD and MacOS
Both Colima and Minukube are run as arm64 already.

Did you know why it always picked the amd64 one?
I suggest that's the root cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants