How to handle dragonfly_pipeline_queue_length
that hangs forever (pipeline hangs)?
#3997
-
Problem
(below is a Python code, but I think it is irrelevant here as the problem is related only to the DragonflyDB internal pipeline execution) pipe = redis_client.pipeline()
key_count = 65536
for i in count():
if i == key_count:
break
pipe.hset(hash_name, mapping={str(i): str(j) for j in range(10_000)})
pipe.execute() results in where Later in the log I have
Versions env:
- name: DFLY_cache_mode
value: "false"
- name: DFLY_enable_heartbeat_eviction
value: "false"
- name: DFLY_dbnum
value: "1"
- name: DFLY_proactor_threads
value: "2"
- name: DFLY_dbfilename
value: dump
- name: DFLY_maxmemory
value: "5100273664"
- name: DFLY_logtostdout
value: "true"
- name: DFLY_aclfile
value: /dragonfly/snapshots/acl.file
- name: HEALTHCHECK_PORT
value: "9999"
image: docker.dragonflydb.io/dragonflydb/dragonfly:v1.24.0
QuestionWhat to do if a pipeline hangs forever inside DragonflyDB after I call Update:A smaller pipeline worked. Now I have an assumption that for a too large pipeline DragonflyDB hangs forever silently. How can it abort the pipeline and raise an error to the client? RelatedI tried to search for "pipeline", "dragonfly_pipeline_queue_length", "Some commands are still being dispatched", but I found no related open issues or discussions. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
Can you please provide a minimal reproducible example? This snippet is not clear to me. Are you saying it happens all the time? |
Beta Was this translation helpful? Give feedback.
-
Thanks again for putting an effort to present the problem in most clear way, and even explore whether this is a regression and when it appeared 🍻 |
Beta Was this translation helpful? Give feedback.
I have not explained why there is a regression. The PR that causes it is #3152
Before that Dragonfly avoided the deadlock scenario above by reading all the input data from the socket into its memory buffers. Once it did that, the python client could proceed with consuming the replies and the deadlock did not happen.
So Dragonly just read infinite number requests - it's a weakness that could potentially lead to OOM.
This PR introduced limits to that: Dragonfly stopped reading requests if it had more than K bytes in pipeline buffers per IO thread.
pipeline_buffer_limit
is the flag that controls that and I just confirmed thatdocker run --network=host docker.dragonflydb.io/dragonflydb/dragon…