-
Notifications
You must be signed in to change notification settings - Fork 16
Workers
A worker is used to deploy operators at scale. All the worker files can be found at src/worker/
folder.
This is a test worker that begins a Rabbit MQ
queue, video operator is ran on the file and the output vector is stored in elasticsearch
- Modify the
docker-compose.yml
file to include a container for theworker
and add thevenv
volume.
worker:
container_name: feluda_worker
build:
context: ./src
dockerfile: worker/vidvec/Dockerfile.video_worker
target: production
args:
- "UID=${UID:-1000}"
- "GID=${GID:-1000}"
volumes:
- ./src:/home/python/app/
- venv:/home/python/app/venv/
env_file: ./src/development.env
command: tail -f /dev/null
depends_on:
store:
condition: service_started
queue:
condition: service_started
postgres:
container_name: postgres
image: postgres@sha256:49fd8c13fbd0eb92572df9884ca41882a036beac0f12e520274be85e7e7806e9 # postgres:16.2-alpine3.19
volumes:
- ./data:/var/lib/postgresql/data
environment:
POSTGRES_USER: "tattle"
POSTGRES_PASSWORD: "tattle_pw"
POSTGRES_DB: "tattle_db"
ports:
- "5432:5432"
pgadmin:
container_name: pgadmin
image: dpage/pgadmin4@sha256:18cd5711fc9a7ed633a5c4b2b1a8f3e969d9262a94b8166c79fe0bba52697788 # dpage/pgadmin4:8.4
environment:
PGADMIN_DEFAULT_EMAIL: [email protected]
PGADMIN_DEFAULT_PASSWORD: adminpassword
ports:
- "5050:80"
volumes:
- pgadmin_data:/var/lib/pgadmin
depends_on:
- postgres
restart: always
volumes:
pgadmin_data: {}
For a pre-built ARM image, run the following command first and use the following docker compose file settings. Refer this
$ docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
worker:
image: <built-arm-image>
platform: linux/arm64
container_name: feluda_worker
volumes:
- /usr/bin/qemu-aarch64-static:/usr/bin/qemu-aarch64-static
env_file: ./src/development.env
command: tail -f /dev/null
depends_on:
store:
condition: service_started
queue:
condition: service_started
- Start the docker container
docker-compose up store queue worker
- Exec into the feluda_worker container and install relevant python libraries
docker exec --user python -it feluda_worker /bin/sh
Note
You can now run all the tests inside the worker, apart from those requiring python server.py
and tests for other operators. Follow the instructions listed here.
- Run the worker
Make sure you are in the/app
folder in the docker container. Then run theworker/vidvec/video_worker.py
file using the following command :
python -m worker.vidvec.video_worker
Keep the worker running and in a new terminal run the video_payload_writer
script, that sends payload(containing the media urls) to the worker
python -m worker.vidvec.video_payload_writer
- To test if the worker and try reconnecting to RabbitMQ when MQ crashes, follow the below steps.
- Bring up the docker containers individually.
docker-compose up -d store
docker-compose up -d queue
docker-compose up -d worker
- Run the worker
docker exec --user python -it feluda_worker /bin/sh
python -m worker.vidvec.video_worker
- Run the writer
docker exec --user python -it feluda_worker /bin/sh
python -m worker.vidvec.video_payload_writer
- The writer will add 15 messages to the queue, which the worker processing serially. While this processing is happening, you should bring down the
queue
container to stop RabbitMQ
docker-compose up -d --scale queue=0
Check the worker logs, it will show an disconnection error and try reconnecting.
- To bring the
queue
docker container back up
docker-compose up -d --scale queue=1
Now the worker should reconnect to RabbitMQ and start consuming messages where it left off.
This is a test worker that begins a Rabbit MQ queue, audio operator is ran on the file and the output vector is stored in elasticsearch
- Modify the
docker-compose.yml
file to include a container for theworker
.
worker:
container_name: feluda_worker
build:
context: ./src
dockerfile: worker/audiovec/Dockerfile.audio_worker
target: production
args:
- "UID=${UID:-1000}"
- "GID=${GID:-1000}"
volumes:
- ./src:/home/python/app/
- venv:/home/python/app/venv/
env_file: ./src/development.env
command: tail -f /dev/null
depends_on:
store:
condition: service_started
queue:
condition: service_started
volumes:
venv: {}
- Start the docker container
docker-compose up store queue worker
- Exec into the feluda_worker container and install relevant python libraries
docker exec --user python -it feluda_worker /bin/sh
Note
You can now run all the tests inside the worker, apart from those requiring python server.py
and tests for other operators. Follow the instructions listed here.
- Run the worker
Make sure you are in the/app
folder in the docker container. Then run theworker/audiovec/audio_worker.py
file using the following command :
python -m worker.audiovec.audio_worker
- Keep the worker running and in a new terminal run the
audio_payload_writer
script, that sends payload(containing the media urls) to the worker
python -m worker.audiovec.audio_payload_writer
Follow similar steps as the Video Wokrer
listed here
- Modify the
docker-compose.yml
file to include a container for theworker
andpostgres
.
worker:
container_name: feluda_worker
build:
context: ./src
dockerfile: worker/hash/Dockerfile.hash_worker
target: production
args:
- "UID=${UID:-1000}"
- "GID=${GID:-1000}"
volumes:
- ./src:/home/python/app/
- venv:/home/python/app/venv/
env_file: ./src/development.env
command: tail -f /dev/null
depends_on:
store:
condition: service_started
queue:
condition: service_started
postgres:
container_name: postgres
image: postgres@sha256:49fd8c13fbd0eb92572df9884ca41882a036beac0f12e520274be85e7e7806e9 # postgres:16.2-alpine3.19
volumes:
- ./data:/var/lib/postgresql/data
environment:
POSTGRES_USER: "tattle"
POSTGRES_PASSWORD: "tattle_pw"
POSTGRES_DB: "tattle_db"
ports:
- "5432:5432"
pgadmin:
container_name: pgadmin
image: dpage/pgadmin4@sha256:18cd5711fc9a7ed633a5c4b2b1a8f3e969d9262a94b8166c79fe0bba52697788 # dpage/pgadmin4:8.4
environment:
PGADMIN_DEFAULT_EMAIL: [email protected]
PGADMIN_DEFAULT_PASSWORD: adminpassword
ports:
- "5050:80"
volumes:
- pgadmin_data:/var/lib/pgadmin
depends_on:
- postgres
restart: always
volumes:
pgadmin_data: