Stop issued in middle of handler can trigger deadlock? #1180

vlovich · 2025-01-10T22:04:29Z

Discovered this as part of #1179. I think I also observe some kind of race condition where xitca occassionally never terminates the server even though a stop has been requested. Graceful or ungraceful - doesn't matter. However, I believe the stop has to be issued precisely while the handler is still in the middle of running the code.

My handler looks something like this:

    #[route("/tests/upload", method = post)]
    pub(super) async fn route_upload(
        ctx: &WebContext<'_, ServiceState>,
        Body(mut body): Body<RequestBody>,
    ) -> std::result::Result<WebResponse, xitca_web::error::Error<ServiceState>> {
        let mut received = 0;
        let mut chunk_id = 0;

        if let Some(state) = &ctx.state().test {
            state.notify("enter::route_upload").await;
        }

        while let Some(chunk) = body.next().await {
            match chunk {
                Ok(chunk) => {
                    if let Some(state) = &ctx.state().test {
                        state.notify(format!("route_upload::chunk-{chunk_id}")).await;
                    }
                    chunk_id += 1;
                    eprintln!("Read chunk {chunk_id:?} {:?} bytes", chunk.len());
                    received += chunk.len();
                }
                Err(e) => {
                    eprintln!("Chunk failed with error");
                    return Ok(WebResponse::builder()
                        .status(StatusCode::INTERNAL_SERVER_ERROR)
                        .body(ResponseBody::bytes(e.to_string()))
                        .unwrap())
                }
            }
        }

        eprintln!("Finished reading body");

        if let Some(state) = &ctx.state().test {
            state.notify("exit::route_upload").await;
        }

        return Ok(WebResponse::builder()
            .status(StatusCode::OK)
            .body(ResponseBody::bytes(format!("{received}")))
            .unwrap());
    }

state.test is a helper that lets me wait or send strings between the thread running the handler and the test harness.

In my test harness, the reqwest body is a wrapper over a tokio::mpsc channel of 1 message deep. I first wait for "enter::route_upload" to be sent from inside the handler. I then issue a graceful shutdown to the server handle. Then I send the first chunk into the mpsc, wait for it to be acked by the handler, send the second chunk, wait for ack, and then drop the writer and wait for the "exit::route_upload" event. About 20% of the time the thread.join on the thread running xitca hangs (all futures complete - the HTTP handler, the test harness, etc). Running under a debugger I see that the xitca worker threads are all still running and none have taken any steps to shutdown.

The text was updated successfully, but these errors were encountered:

vlovich · 2025-01-11T00:58:30Z

If I set worker_max_blocking_threads to 1 then the issue disappears AFAICT. If I set worker_threads to 1 or even 4 and worker_max_blocking_threads to 100 this issue also disappears (or at least I can't easily hit it in either case). So it's something about high worker_max_blocking_threads and high worker_threads (I have a 32 core machine).

Since I only ever make 1 request before requesting a shutdown, this seems like a thread synchronization bug within xitca.

fakeshadow · 2025-01-17T09:02:41Z

Cant reproduce your claim. A standalone minimal example is needed for further investigation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop issued in middle of handler can trigger deadlock? #1180

Stop issued in middle of handler can trigger deadlock? #1180

vlovich commented Jan 10, 2025

vlovich commented Jan 11, 2025

fakeshadow commented Jan 17, 2025

Stop issued in middle of handler can trigger deadlock? #1180

Stop issued in middle of handler can trigger deadlock? #1180

Comments

vlovich commented Jan 10, 2025

vlovich commented Jan 11, 2025

fakeshadow commented Jan 17, 2025