Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added readiness and liveness probes #8488

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you also need to add an __init__.py to commands.

Empty file.
29 changes: 29 additions & 0 deletions cvat/apps/health/management/commands/workerprobe.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
import os
import platform
from datetime import datetime
from django.core.management.base import BaseCommand, CommandError
from django.conf import settings
from rq.worker import Worker
import django_rq


class Command(BaseCommand):
help = "Check worker liveness in specified queues"

def add_arguments(self, parser):
parser.add_argument("queue_names", nargs="+", type=str)

def handle(self, *args, **options):
allowed_queue_names = list(q.value for q in settings.CVAT_QUEUES)
hostname = platform.node()
for queue_name in options["queue_names"]:
if queue_name not in allowed_queue_names:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like you could replace this with if queue_name not in settings.RQ_QUEUES, making allowed_queue_names unnecessary.

raise CommandError(f"Queue {queue_name} is not defined")

queue = django_rq.get_queue(queue_name)

workers = [w for w in Worker.all(queue.connection) if queue.name in w.queue_names() and w.hostname == hostname]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you just need the queue for the connection, it would make more sense to get the connection directly (django_rq.get_connection).


if len(workers) != int(os.getenv("NUMPROCS", 1)) or \
not all((datetime.now() - w.last_heartbeat).seconds < w.worker_ttl for w in workers):
Copy link
Contributor

@coderabbitai coderabbitai bot Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Use total_seconds() instead of seconds for accurate time difference calculation

The use of .seconds in the time difference calculation only returns the number of seconds modulo 24 hours, which can lead to incorrect results if the time difference exceeds one day. To ensure the liveness check is accurate regardless of the time difference, use .total_seconds() instead.

Apply this diff to fix the issue:

-                 not all((datetime.now() - w.last_heartbeat).seconds < w.worker_ttl for w in workers):
+                 not all((datetime.now() - w.last_heartbeat).total_seconds() < w.worker_ttl for w in workers):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
not all((datetime.now() - w.last_heartbeat).seconds < w.worker_ttl for w in workers):
not all((datetime.now() - w.last_heartbeat).total_seconds() < w.worker_ttl for w in workers):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rabbit is correct here, though IMO a better fix would be datetime.now() - w.last_heartbeat < timedelta(seconds=w.worker_ttl).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SpecLad You're absolutely right! Using datetime.now() - w.last_heartbeat < timedelta(seconds=w.worker_ttl) provides a cleaner and more accurate comparison by directly comparing timedelta objects. Here's the updated code:

not all(datetime.now() - w.last_heartbeat < timedelta(seconds=w.worker_ttl) for w in workers):

Thanks for the suggestion!

(^_^)/

raise CommandError(f"Unhealthy workers in the {queue_name} queue")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to print what exactly is wrong here (expected number vs actual number, etc.).

2 changes: 1 addition & 1 deletion helm-chart/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.13.2
version: 0.14.0

# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
Expand Down
14 changes: 14 additions & 0 deletions helm-chart/templates/cvat_backend/server/deployment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,20 @@ spec:
{{- end }}
ports:
- containerPort: 8080
{{- if $localValues.readinessProbe.enabled }}
readinessProbe:
httpGet:
path: /api/server/about
port: 8080
{{- toYaml (omit $localValues.readinessProbe "enabled") | nindent 12 }}
{{- end }}
{{- if $localValues.livenessProbe.enabled }}
livenessProbe:
httpGet:
path: /api/server/health/
port: 8080
{{- toYaml (omit $localValues.livenessProbe "enabled") | nindent 12 }}
{{- end }}
Comment on lines +68 to +81
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Missing timing parameters in probe configurations.

The following readinessProbe and livenessProbe definitions are missing essential timing parameters (initialDelaySeconds, periodSeconds, timeoutSeconds, successThreshold, and failureThreshold):

  • helm-chart/templates/cvat_kvrocks/statefulset.yml
    • readinessProbe
    • livenessProbe
  • helm-chart/values.yaml
    • readinessProbe
    • livenessProbe
  • helm-chart/templates/cvat_opa/deployment.yml
    • readinessProbe
    • livenessProbe
  • helm-chart/templates/cvat_backend/worker_webhooks/deployment.yml
    • readinessProbe
    • livenessProbe
  • helm-chart/templates/cvat_backend/worker_import/deployment.yml
    • readinessProbe
    • livenessProbe
  • helm-chart/templates/cvat_backend/server/deployment.yml
    • readinessProbe
    • livenessProbe
  • helm-chart/templates/cvat_backend/worker_qualityreports/deployment.yml
    • readinessProbe
    • livenessProbe
  • helm-chart/templates/cvat_backend/worker_annotation/deployment.yml
    • readinessProbe
    • livenessProbe
  • helm-chart/templates/cvat_backend/utils/deployment.yml
    • readinessProbe
    • livenessProbe
  • helm-chart/templates/cvat_backend/worker_analyticsreports/deployment.yml
    • readinessProbe
    • livenessProbe
  • helm-chart/templates/cvat_frontend/deployment.yml
    • readinessProbe
    • livenessProbe
  • helm-chart/templates/cvat_backend/worker_export/deployment.yml
    • readinessProbe
    • livenessProbe

Please update each probe configuration to include the necessary timing parameters to ensure proper functionality and avoid potential deployment issues.

🔗 Analysis chain

Verify probe timing parameters.

The implementation of both readiness and liveness probes looks good. However, it's important to ensure that appropriate timing parameters (such as initialDelaySeconds, periodSeconds, timeoutSeconds, successThreshold, and failureThreshold) are configured in the $localValues.readinessProbe and $localValues.livenessProbe objects.

These parameters are crucial for fine-tuning the behavior of the probes and avoiding premature restarts or false positives/negatives.

To confirm the presence of these parameters, you can run the following command:

Please review the output to ensure that appropriate timing parameters are set for both probes.


Probe configurations not found in values.yaml.

To locate and verify the timing parameters for both readiness and liveness probes, please run the following command:


Verify the presence of timing parameters in all probe configurations.

To ensure that both readiness and liveness probes have appropriate timing parameters configured, please run the following command:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for probe timing parameters in values.yaml

# Test: Search for readinessProbe and livenessProbe configurations
rg --type yaml 'readinessProbe:|livenessProbe:' -A 10 values.yaml

Length of output: 119


Script:

#!/bin/bash
# Description: Search for readinessProbe and livenessProbe across all YAML files

rg --type yaml '(readinessProbe:|livenessProbe:)' --multiline

Length of output: 2373


Script:

#!/bin/bash
# Description: Check for timing parameters in all readinessProbe and livenessProbe definitions

# Define the timing parameters to check
TIMING_PARAMS=("initialDelaySeconds" "periodSeconds" "timeoutSeconds" "successThreshold" "failureThreshold")

# Search for readinessProbe and livenessProbe in all YAML files
probes=$(rg --type yaml '(readinessProbe:|livenessProbe:)' --multiline)

# Iterate over each probe found
echo "$probes" | while read -r line; do
  file=$(echo "$line" | cut -d':' -f1)
  probe=$(echo "$line" | cut -d':' -f2 | tr -d ' ')
  echo "Checking $probe in $file"
  for param in "${TIMING_PARAMS[@]}"; do
    if ! rg --type yaml "$param:" "$file" >/dev/null; then
      echo "  - Missing $param in $probe"
    fi
  done
done

Length of output: 50850

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This liveness probe is IMO not suitable. /api/server/health/ returns the aggregate status of the whole system, which includes services that the server has no control over (like the database and Redis). The purpose of the liveness probe is to tell Kubernetes when to restart the pod, but it doesn't make sense to restart the server when, e.g., the database is down.

I think that we should follow the documentation here:

A common pattern for liveness probes is to use the same low-cost HTTP endpoint as for readiness probes, but with a higher failureThreshold. This ensures that the pod is observed as not-ready for some period of time before it is hard killed.

volumeMounts:
{{- if not .Values.cvat.backend.disableDistinctCachePerService }}
- mountPath: /home/django/data/cache
Expand Down
13 changes: 11 additions & 2 deletions helm-chart/templates/cvat_backend/utils/deployment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,17 @@ spec:
{{- with concat .Values.cvat.backend.additionalEnv $localValues.additionalEnv }}
{{- toYaml . | nindent 10 }}
{{- end }}
ports:
- containerPort: 8080
{{- if .Values.cvat.backend.worker.livenessProbe.enabled }}
livenessProbe:
exec:
command:
- python
- manage.py
- workerprobe
- notifications
- cleaning
{{- toYaml (omit .Values.cvat.backend.worker.livenessProbe "enabled") | nindent 12 }}
{{- end }}
Comment on lines +63 to +73
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider differentiating between readiness and liveness probes.

While the addition of the liveness probe is beneficial, using the same command for both readiness and liveness probes might not be ideal. The liveness probe should check if the application is alive, while the readiness probe checks if it's ready to accept traffic.

Consider implementing a simpler check for the liveness probe that doesn't depend on the worker tasks. For example, you could create a simple endpoint that returns a 200 OK status if the application is running. This would prevent unnecessary restarts if the worker tasks are temporarily unavailable.

Example:

livenessProbe:
  httpGet:
    path: /healthz
    port: 8000
  initialDelaySeconds: {{ .Values.cvat.backend.worker.livenessProbe.initialDelaySeconds }}
  periodSeconds: {{ .Values.cvat.backend.worker.livenessProbe.periodSeconds }}
  timeoutSeconds: {{ .Values.cvat.backend.worker.livenessProbe.timeoutSeconds }}
  successThreshold: {{ .Values.cvat.backend.worker.livenessProbe.successThreshold }}
  failureThreshold: {{ .Values.cvat.backend.worker.livenessProbe.failureThreshold }}

This assumes you've implemented a /healthz endpoint in your application that performs a basic aliveness check.

volumeMounts:
{{- if not .Values.cvat.backend.disableDistinctCachePerService }}
- mountPath: /home/django/data/cache
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,16 @@ spec:
{{- with concat .Values.cvat.backend.additionalEnv $localValues.additionalEnv }}
{{- toYaml . | nindent 10 }}
{{- end }}
{{- if .Values.cvat.backend.worker.livenessProbe.enabled }}
livenessProbe:
exec:
command:
- python
- manage.py
- workerprobe
- analytics_reports
{{- toYaml (omit .Values.cvat.backend.worker.livenessProbe "enabled") | nindent 12 }}
{{- end }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is duplicated everywhere. Can you factor it out into a template?

{{- with concat .Values.cvat.backend.additionalVolumeMounts $localValues.additionalVolumeMounts }}
volumeMounts:
{{- toYaml . | nindent 10 }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,16 @@ spec:
{{- with concat .Values.cvat.backend.additionalEnv $localValues.additionalEnv }}
{{- toYaml . | nindent 10 }}
{{- end }}
{{- if .Values.cvat.backend.worker.livenessProbe.enabled }}
livenessProbe:
exec:
command:
- python
- manage.py
- workerprobe
- annotation
{{- toYaml (omit .Values.cvat.backend.worker.livenessProbe "enabled") | nindent 12 }}
{{- end }}
volumeMounts:
{{- if not .Values.cvat.backend.disableDistinctCachePerService }}
- mountPath: /home/django/data/cache
Expand Down
10 changes: 10 additions & 0 deletions helm-chart/templates/cvat_backend/worker_export/deployment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,16 @@ spec:
{{- with concat .Values.cvat.backend.additionalEnv $localValues.additionalEnv }}
{{- toYaml . | nindent 10 }}
{{- end }}
{{- if .Values.cvat.backend.worker.livenessProbe.enabled }}
livenessProbe:
exec:
command:
- python
- manage.py
- workerprobe
- export
{{- toYaml (omit .Values.cvat.backend.worker.livenessProbe "enabled") | nindent 12 }}
{{- end }}
Comment on lines +64 to +73
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider differentiating between readiness and liveness probes.

The liveness probe is well-implemented and follows the same pattern as the readiness probe. However, using the same command for both readiness and liveness probes might not be ideal.

Consider the following suggestions:

  1. Differentiate the commands used for readiness and liveness probes. For example, the readiness probe could check if the worker is ready to accept new tasks, while the liveness probe could check if the worker is still running and responsive.
  2. If different commands are not feasible, consider adding parameters to the workerprobe command to distinguish between readiness and liveness checks.

Example:

livenessProbe:
  exec:
    command:
    - python
    - manage.py
    - workerprobe
    - export
    - --liveness  # Add a parameter to distinguish liveness check

This differentiation will provide more accurate health monitoring for your Kubernetes deployment.

volumeMounts:
{{- if not .Values.cvat.backend.disableDistinctCachePerService }}
- mountPath: /home/django/data/cache
Expand Down
10 changes: 10 additions & 0 deletions helm-chart/templates/cvat_backend/worker_import/deployment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,16 @@ spec:
{{- with concat .Values.cvat.backend.additionalEnv $localValues.additionalEnv }}
{{- toYaml . | nindent 10 }}
{{- end }}
{{- if .Values.cvat.backend.worker.livenessProbe.enabled }}
livenessProbe:
exec:
command:
- python
- manage.py
- workerprobe
- import
{{- toYaml (omit .Values.cvat.backend.worker.livenessProbe "enabled") | nindent 12 }}
{{- end }}
volumeMounts:
{{- if not .Values.cvat.backend.disableDistinctCachePerService }}
- mountPath: /home/django/data/cache
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,16 @@ spec:
{{- with concat .Values.cvat.backend.additionalEnv $localValues.additionalEnv }}
{{- toYaml . | nindent 10 }}
{{- end }}
{{- if .Values.cvat.backend.worker.livenessProbe.enabled }}
livenessProbe:
exec:
command:
- python
- manage.py
- workerprobe
- quality_reports
{{- toYaml (omit .Values.cvat.backend.worker.livenessProbe "enabled") | nindent 12 }}
{{- end }}
{{- with concat .Values.cvat.backend.additionalVolumeMounts $localValues.additionalVolumeMounts }}
volumeMounts:
{{- toYaml . | nindent 10 }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,17 @@ spec:
{{- with concat .Values.cvat.backend.additionalEnv $localValues.additionalEnv }}
{{- toYaml . | nindent 10 }}
{{- end }}

{{- if .Values.cvat.backend.worker.livenessProbe.enabled }}
livenessProbe:
exec:
command:
- python
- manage.py
- workerprobe
- webhooks
{{- toYaml (omit .Values.cvat.backend.worker.livenessProbe "enabled") | nindent 12 }}
{{- end }}
{{- with concat .Values.cvat.backend.additionalVolumeMounts $localValues.additionalVolumeMounts }}
volumeMounts:
{{- toYaml . | nindent 10 }}
Expand Down
12 changes: 12 additions & 0 deletions helm-chart/templates/cvat_frontend/deployment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,18 @@ spec:
{{- toYaml . | nindent 10 }}
{{- end }}
{{- with .Values.cvat.frontend.additionalVolumeMounts }}
{{- if .Values.cvat.frontend.readinessProbe.enabled }}
readinessProbe:
tcpSocket:
port: 80
{{- toYaml (omit .Values.cvat.frontend.readinessProbe "enabled") | nindent 12 }}
{{- end }}
{{- if .Values.cvat.frontend.livenessProbe.enabled }}
livenessProbe:
tcpSocket:
port: 80
{{- toYaml (omit .Values.cvat.frontend.livenessProbe "enabled") | nindent 12 }}
{{- end }}
volumeMounts:
{{- toYaml . | nindent 10 }}
{{- end }}
Expand Down
19 changes: 19 additions & 0 deletions helm-chart/templates/cvat_kvrocks/statefulset.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,25 @@ spec:
{{- with .Values.cvat.kvrocks.additionalEnv }}
{{- toYaml . | nindent 10 }}
{{- end }}
#https://github.com/apache/kvrocks/blob/unstable/Dockerfile
{{- if .Values.cvat.kvrocks.readinessProbe.enabled }}
readinessProbe:
exec:
command:
- /bin/sh
- -c
- ./bin/redis-cli -p 6666 PING | grep -E '(PONG|NOAUTH)' || exit 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think || exit 1 does anything useful here. It just overwrites the previous error code.

{{- toYaml (omit .Values.cvat.kvrocks.readinessProbe "enabled") | nindent 12 }}
{{- end }}
{{- if .Values.cvat.kvrocks.livenessProbe.enabled }}
livenessProbe:
exec:
command:
- /bin/sh
- -c
- ./bin/redis-cli -p 6666 PING | grep -E '(PONG|NOAUTH)' || exit 1
{{- toYaml (omit .Values.cvat.kvrocks.livenessProbe "enabled") | nindent 12 }}
{{- end }}
Comment on lines +72 to +80
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider differentiating the liveness probe from the readiness probe

While the liveness probe implementation is functional, it's identical to the readiness probe. This approach might not be ideal:

  1. Readiness and liveness probes serve different purposes:

    • Readiness determines if the pod is ready to receive traffic.
    • Liveness determines if the pod needs to be restarted.
  2. Using the same check for both could lead to unnecessary restarts if the pod is temporarily unable to serve requests but is still running.

  3. The current check might be too strict for a liveness probe, as it requires the Redis server to be fully operational.

Consider implementing a simpler liveness probe that checks if the kvrocks process is running, rather than checking the Redis server's response. For example:

livenessProbe:
  exec:
    command:
    - /bin/sh
    - -c
    - pgrep kvrocks || exit 1
  {{- toYaml (omit .Values.cvat.kvrocks.livenessProbe "enabled") | nindent 12 }}

This change would make the liveness probe less likely to cause unnecessary restarts while still detecting if the kvrocks process has crashed.

volumeMounts:
- name: {{ .Release.Name }}-kvrocks-data
mountPath: /var/lib/kvrocks/data
Expand Down
12 changes: 12 additions & 0 deletions helm-chart/templates/cvat_opa/deployment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,18 @@ spec:
env:
{{- toYaml . | nindent 10 }}
{{- end }}
{{- if .Values.cvat.opa.readinessProbe.enabled }}
readinessProbe:
tcpSocket:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OPA has an actual health endpoint, so perhaps you could use that?

port: 8181
{{- toYaml (omit .Values.cvat.opa.readinessProbe "enabled") | nindent 12 }}
{{- end }}
{{- if .Values.cvat.opa.livenessProbe.enabled }}
livenessProbe:
tcpSocket:
port: 8181
{{- toYaml (omit .Values.cvat.opa.livenessProbe "enabled") | nindent 12 }}
{{- end }}
{{- with .Values.cvat.opa.additionalVolumeMounts }}
volumeMounts:
{{- toYaml . | nindent 10 }}
Expand Down
34 changes: 34 additions & 0 deletions helm-chart/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,21 @@ cvat:
additionalEnv: []
additionalVolumes: []
additionalVolumeMounts: []
readinessProbe:
enabled: true
periodSeconds: 15
initialDelaySeconds: 15
livenessProbe:
enabled: true
periodSeconds: 15
failureThreshold: 10
initialDelaySeconds: 60
worker:
livenessProbe:
enabled: true
periodSeconds: 120
initialDelaySeconds: 30
timeoutSeconds: 10
export:
replicas: 2
labels: {}
Expand Down Expand Up @@ -172,6 +186,10 @@ cvat:
# - mountPath: /tmp
# name: tmp
# subPath: test
readinessProbe:
enabled: true
livenessProbe:
enabled: true
Comment on lines +189 to +192
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Frontend probe configuration needs more specific parameters.

While enabling readiness and liveness probes for the frontend is a good practice, the current configuration lacks specific parameters. This may lead to using default values that might not be optimal for a frontend service.

Consider adding more specific parameters to the frontend probes. Here's a suggested configuration:

readinessProbe:
  enabled: true
  httpGet:
    path: /
    port: 80
  periodSeconds: 10
  initialDelaySeconds: 30
  failureThreshold: 3

livenessProbe:
  enabled: true
  httpGet:
    path: /
    port: 80
  periodSeconds: 30
  initialDelaySeconds: 60
  failureThreshold: 3

These settings provide a more tailored approach for a frontend service, with appropriate paths, ports, and timing parameters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you have other options here, like in the other probe settings?

service:
type: ClusterIP
ports:
Expand Down Expand Up @@ -216,6 +234,14 @@ cvat:
# name: tmp
# subPath: test
composeCompatibleServiceName: true # Sets service name to opa in order to be compatible with Docker Compose. Necessary because changing IAM_OPA_DATA_URL via environment variables in current images. Hinders multiple deployment due to duplicate name
readinessProbe:
enabled: true
periodSeconds: 15
initialDelaySeconds: 15
livenessProbe:
enabled: true
periodSeconds: 15
initialDelaySeconds: 15
service:
type: ClusterIP
ports:
Expand Down Expand Up @@ -266,6 +292,14 @@ cvat:
# - mountPath: /tmp
# name: tmp
# subPath: test
readinessProbe:
enabled: true
periodSeconds: 10
initialDelaySeconds: 30
livenessProbe:
enabled: true
periodSeconds: 10
initialDelaySeconds: 30
defaultStorage:
enabled: true
# storageClassName: default
Expand Down
3 changes: 2 additions & 1 deletion supervisord/utils.conf
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,14 @@ command=%(ENV_HOME)s/wait_for_deps.sh
-i 30 --path %(ENV_HOME)s
environment=VECTOR_EVENT_HANDLER="SynchronousLogstashHandler"
numprocs=1
autorestart=true

[program:rqworker-notifications]
command=%(ENV_HOME)s/wait_for_deps.sh
python3 %(ENV_HOME)s/manage.py rqworker -v 3 notifications
--worker-class cvat.rqworker.DefaultWorker
environment=VECTOR_EVENT_HANDLER="SynchronousLogstashHandler",CVAT_POSTGRES_APPLICATION_NAME="cvat:worker:notifications"
numprocs=1
numprocs=%(ENV_NUMPROCS)s
autorestart=true

[program:rqworker-cleaning]
Expand Down
19 changes: 17 additions & 2 deletions tests/python/shared/fixtures/init.py
Original file line number Diff line number Diff line change
Expand Up @@ -240,11 +240,26 @@ def kube_restore_clickhouse_db():


def docker_restore_redis_inmem():
docker_exec_redis_inmem(["redis-cli", "-e", "flushall"])
docker_exec_redis_inmem(
[
"sh",
"-c",
"for p in rq:finished:* rq:job:* rq:wip:* rq:finished:* rq:failed:*; "
'do redis-cli -e --scan --pattern "$p" | xargs -r redis-cli -e del ; done',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain what you did here?

]
)


def kube_restore_redis_inmem():
kube_exec_redis_inmem(["sh", "-c", 'redis-cli -e -a "${REDIS_PASSWORD}" flushall'])
kube_exec_redis_inmem(
[
"sh",
"-c",
"for p in rq:finished:* rq:job:* rq:wip:* rq:finished:* rq:failed:*; "
'do redis-cli -e -a "${REDIS_PASSWORD}" --scan --pattern "$p" | '
'xargs -r redis-cli -e -a "${REDIS_PASSWORD}" del ; done',
]
)


def docker_restore_redis_ondisk():
Expand Down
Loading