Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processed request response stuck when workers are not available #247

Open
Pebin opened this issue Aug 9, 2024 · 20 comments
Open

Processed request response stuck when workers are not available #247

Pebin opened this issue Aug 9, 2024 · 20 comments

Comments

@Pebin
Copy link

Pebin commented Aug 9, 2024

Hello,

I started bjoern server with 2 workers and sent 6 requests in parallel - scripts below

the server just sleeps for 1 second and then responds

What I would expect to happen is that the request would finish twice every second and their duration would be like this
1,1, 2, 2, 3, 3

But the actual duration is 2, 3, 3 ,3, 3, 3

IO task 1: Request duration 0:00:02.009872
IO task 4: Request duration 0:00:03.010739
IO task 5: Request duration 0:00:03.010668
IO task 6: Request duration 0:00:03.010534
IO task 2: Request duration 0:00:03.011427
IO task 3: Request duration 0:00:03.011170
Elapsed time: 0:00:03.012082

why is this happening? am I setting up the workers incorrectly?

the time before and after the sleep method is exactly what I would expect: the following output is just sorted by worker

1294942 - start 15:41:18.793109
1294942 - stop 15:41:19.794185
1294942 - start 15:41:19.794448
1294942 - stop 15:41:20.795519
1294942 - start 15:41:20.795629
1294942 - stop 15:41:21.796366

1294941 - start 15:41:18.793265
1294941 - stop 15:41:19.794420
1294941 - start 15:41:19.794570
1294941 - stop 15:41:20.795605
1294941 - start 15:41:20.795678
1294941 - stop 15:41:21.796336

If I send just 2 requests then both are returned in 1 second

IO task 1: Request duration 0:00:01.005457
IO task 2: Request duration 0:00:01.005327
Elapsed time: 0:00:01.006112

if I send three then duration of 2 of them is 2 seconds already 👀

IO task 1: Request duration 0:00:01.005968
IO task 3: Request duration 0:00:02.006177
IO task 2: Request duration 0:00:02.006436
Elapsed time: 0:00:02.007061

but the request doesn't end after the return? It seems like the worker is already busy with the new request before the response is sent

bjoern 3.2.2

from time import sleep

import bjoern
import os, signal
from datetime import datetime

HOST = '0.0.0.0'
PORT = 8080
N_WORKERS = 2

worker_pids = []

def app(e, s):
    s('200 OK', [])
    print(f"worker id: {os.getpid()} - {e['PATH_INFO']} - start {datetime.now().strftime('%H:%M:%S.%f')}")
    sleep(1)
    print(f"worker id: {os.getpid()} - {e['PATH_INFO']} - stop {datetime.now().strftime('%H:%M:%S.%f')}")
    return b'%i: %s' % (
        os.getpid(),
        str(datetime.now()).encode('utf-8')
    )

bjoern.listen(app, HOST, PORT)
for _ in range(N_WORKERS):
    pid = os.fork()
    if pid > 0:  # parent
        worker_pids.append(pid)
    elif pid == 0:  # worker
        try:
            bjoern.run()
        except KeyboardInterrupt:
            pass
        exit()

try:
    for _ in range(N_WORKERS):
        os.wait()
except KeyboardInterrupt:
    for pid in worker_pids:
        os.kill(pid, signal.SIGINT)
import random
from concurrent.futures import ThreadPoolExecutor
from datetime import datetime
import requests

def run_io_tasks_in_parallel(tasks):
    with ThreadPoolExecutor() as executor:
        running_tasks = [executor.submit(task) for task in tasks]
        for running_task in running_tasks:
            running_task.result()

def task():
    now = datetime.now()
    request_id = random.randint(1, 100)
    print(f"request_id: {request_id} - start: {now}")
    requests.get(f"http://localhost:8080/request_id={request_id}")
    request_end = datetime.now()
    print(f"request_id: {request_id} - stop: {request_end}")
    return f"Request duration {(request_end - now)}, ID: {request_id}"

now = datetime.now()
run_io_tasks_in_parallel([
    lambda: print(f'IO task 1: {task()}'),
    lambda: print(f'IO task 2: {task()}'),
    lambda: print(f'IO task 3: {task()}'),
    # lambda: print(f'IO task 4: {task()}'),
    # lambda: print(f'IO task 5: {task()}'),
    # lambda: print(f'IO task 6: {task()}'),
])

print("Elapsed time:", datetime.now() - now)
@jonashaag
Copy link
Owner

jonashaag commented Aug 9, 2024

Interesting.

In the 2 requests vs. 3 requests case, can you please print for each request:

  • Client side:
    • Request ID generated by client and sent via URL or parameter
    • Request start and end time
  • Server side:
    • Worker ID
    • Request ID received by client
    • Request start and end time

This way we can get a better understanding of how requests are distributed

@Pebin
Copy link
Author

Pebin commented Aug 9, 2024

thanks, here it is. I will also update the src in the first post to you can eventually see where I put the logs

CLIENT:
request_id: 64 - start: 2024-08-09 18:45:11.799936
request_id: 15 - start: 2024-08-09 18:45:11.800355
request_id: 54 - start: 2024-08-09 18:45:11.800637
request_id: 15 - stop: 2024-08-09 18:45:12.806486
IO task 2: Request duration 0:00:01.006131, ID: 15
request_id: 54 - stop: 2024-08-09 18:45:13.808474
IO task 3: Request duration 0:00:02.007837, ID: 54
request_id: 64 - stop: 2024-08-09 18:45:13.808592
IO task 1: Request duration 0:00:02.008656, ID: 64
Elapsed time: 0:00:02.009379

SERVER:

worker id: 1700300 - /request_id=15 - start 18:45:11.804739
worker id: 1700301 - /request_id=64 - start 18:45:11.804744
worker id: 1700301 - /request_id=64 - stop 18:45:12.805850
worker id: 1700300 - /request_id=15 - stop 18:45:12.805927
worker id: 1700301 - /request_id=54 - start 18:45:12.806003
worker id: 1700301 - /request_id=54 - stop 18:45:13.806613

@jonashaag
Copy link
Owner

Interesting. It seems like the event loop of the server with the two requests prioritizes work on the new connection to closing the old one. Not sure why that is and if it's possible to change that behavior without changing the event loop implementation. I can have a look in the next couple of days.

Is your code a reduced reproducing example from a real world problem or are we looking at a purely artificial test here? I wonder what's the impact of this behavior on larger scale and with non-trivial WSGI applications.

@Pebin
Copy link
Author

Pebin commented Aug 9, 2024

Actually I noticed it in production and I tried to reduce it to as small example as possible.

I was debugging why some requests are slower during heavy load and why there are bigger differences between what I see on the application monitoring vs proxy monitoring. Then I noticed that even though the request was handled, the response on the proxy level was sent at the same time as the response of the following request

We don't log the worker id there, but I feel confident that it's the same issue

@jonashaag
Copy link
Owner

Deleted some wrong insights that weren't reproducible. Behavior seems to be flaky.

@jonashaag
Copy link
Owner

OK, it seems like the insight with Content-Length is actually reproducible:

If your application sets Content-Length correctly, the behavior seems to go away.

@jonashaag
Copy link
Owner

OK I think I understood the problem now.

If you don't set Content-Length but use Connection: keep-alive (default with HTTP 1.1) then you'll end up with Transfer-Encoding: chunked. Which means that the body is sent in chunks plus a special end-of-response chunk. Bjoern currently does not do any optimizations to send the end-of-response chunk together with the other chunk(s). So in your case, the response is written in two pieces, and handling of the next requests sneaks between the sending of those pieces.

Follow-up questions:

  • Why does your application not send Content-Length?
  • How can we prioritizing sending outstanding chunks before taking on new requests?
  • Is it worth doing the end-of-response merge optimization?

@Pebin
Copy link
Author

Pebin commented Aug 9, 2024

Thank you Jonas for the thorough investigation.

I need to check the content-lenght. It's a customer facing api and also it's behind few proxies so it can be even getting lost somewhere. But I would never expect that this might be needed.

Regarding the other two questions. I don't think that I am able to answer any of them. This isn't really my area. But the prioritization of outstanding chunks sounds like something that should happen. In that case the merge optimization wouldn't be necessary and even if that was done it's hard to say if it would have a big impact.

@jonashaag
Copy link
Owner

I tried to set write event priority to max while setting read event priority to min, and it still shows the same behavior.

@Pebin
Copy link
Author

Pebin commented Aug 10, 2024

I was also testing the Content-Length header but it still doesn't work for me

I tried these cases

   requests.get(f"http://localhost:8080/request_id={request_id}", headers={"Content-Length": "0"})
   requests.post(f"http://localhost:8080/request_id={request_id}", data="data")
   requests.post(f"http://localhost:8080/request_id={request_id}", data="data", headers={"Content-Length": "4"})

the post has content-length automatically, but I even tried setting it manually with no luck

I validated on the server side that the content-length is indeed received

@jonashaag
Copy link
Owner

No the content length is missing in the response headers. Your WSGI application has to set it.

@Pebin
Copy link
Author

Pebin commented Aug 10, 2024

ah 🤦 well that makes much more sense now. I was just confused.

anyway it seems like it helps just partly. When it's added to the response it sometimes passes, but at least on my laptop it's around 1 pass vs 5 fails and that's both when I use latest bjoern with manually added content-length to the response or your pr that does it automatically

I was checking the response headers and there is just this: {'Content-Length': '35', 'Connection': 'Keep-Alive'}
the 'Transfer-Encoding': 'chunked' header is gone as you said, but for some reason it still doesn't work for me as expected

it works reliably for you with the content-length?

@jonashaag
Copy link
Owner

the 'Transfer-Encoding': 'chunked' header is gone as you said, but for some reason it still doesn't work for me as expected

With the example application you've provided, or with a real application? If your real application sends too much data in the response then you'll end up with the response being sent in multiple pieces and we're back to square one.

@Pebin
Copy link
Author

Pebin commented Aug 11, 2024

No just the example application. I will check the behavior of the real app later today

@Pebin
Copy link
Author

Pebin commented Aug 11, 2024

so I checked our real api and the content length is usually around 100-1000, depending on the endpoint. But anyway, the content-length header is present in the response. We are using connexion + flaskapp. I will try to replicate the issue with some small app. To see if it behaves the same

@Pebin
Copy link
Author

Pebin commented Aug 17, 2024

I played around with it again and I changed the app to connexion + flask and also just flask. In both cases the content-length is always in the response but the behavior stayed the same. Sometimes it responds fast but most of the time it's just stuck there for 2 seconds

@jonashaag
Copy link
Owner

Unfortunately I can't reproduce this with the reproducer app.

import random
from concurrent.futures import ThreadPoolExecutor
from datetime import datetime
import requests

for _ in range(10):

    durations = []

    def run_io_tasks_in_parallel(tasks):
        with ThreadPoolExecutor() as executor:
            running_tasks = [executor.submit(task) for task in tasks]
            for running_task in running_tasks:
                running_task.result()

    def task():
        now = datetime.now()
        request_id = random.randint(1, 100)
        print(f"request_id: {request_id} - start: {now}")
        requests.get(f"http://localhost:8080/request_id={request_id}")
        request_end = datetime.now()
        print(f"request_id: {request_id} - stop: {request_end}")
        durations.append((request_end - now).total_seconds())
        return f"Request duration {(request_end - now)}, ID: {request_id}"

    now = datetime.now()
    run_io_tasks_in_parallel([
        lambda: print(f'IO task 1: {task()}'),
        lambda: print(f'IO task 2: {task()}'),
        lambda: print(f'IO task 3: {task()}'),
        # lambda: print(f'IO task 4: {task()}'),
        # lambda: print(f'IO task 5: {task()}'),
        # lambda: print(f'IO task 6: {task()}'),
    ])

    print("Elapsed time:", datetime.now() - now)
    print("Mean duration", sum(durations)/len(durations))
from time import sleep

import bjoern
import os, signal
from datetime import datetime

HOST = '0.0.0.0'
PORT = 8080
N_WORKERS = 2

worker_pids = []

def app(e, s):
    print(f"worker id: {os.getpid()} - {e['PATH_INFO']} - start {datetime.now().strftime('%H:%M:%S.%f')}")
    sleep(1)
    print(f"worker id: {os.getpid()} - {e['PATH_INFO']} - stop {datetime.now().strftime('%H:%M:%S.%f')}")
    res = b'%i: %s' % (
        os.getpid(),
        str(datetime.now()).encode('utf-8')
    )
    s('200 OK', [('Content-Length', str(len(res)))])
    return [res]

bjoern.listen(app, HOST, PORT)
for _ in range(N_WORKERS):
    pid = os.fork()
    if pid > 0:  # parent
        worker_pids.append(pid)
    elif pid == 0:  # worker
        try:
            bjoern.run()
        except KeyboardInterrupt:
            pass
        exit()

try:
    for _ in range(N_WORKERS):
        os.wait()
except KeyboardInterrupt:
    for pid in worker_pids:
        os.kill(pid, signal.SIGINT)
❯ pixi run python -u cl.py | grep Mean
Mean duration 1.3478
Mean duration 1.3454439999999999
Mean duration 1.3425943333333334
Mean duration 1.3462273333333332
Mean duration 1.346633
Mean duration 1.3440193333333335
Mean duration 1.3482213333333333
Mean duration 1.3486856666666664
Mean duration 1.3495133333333333
Mean duration 1.3465636666666667

@Pebin
Copy link
Author

Pebin commented Aug 18, 2024

Interesting. That made me thinking that maybe I have some specific local setup, so I tried running the server in the debian docker image, because I am using opensuse. I also tried multiple python versions (3.6, 3.7, 3.8, 3.9) but the results are pretty random

I did something like this:

docker run --rm -p 8080:8080 -it -v $(pwd):/usr/src/project python:3.9 bash
apt-get update
apt-get install libev-dev
pip install bjoern
cd /usr/src/project
python main.py

and then I just run the test against the docker image.

Mean duration 1.6728226666666668
Mean duration 1.338989
Mean duration 1.3378889999999999
Mean duration 1.672688
Mean duration 1.6752770000000001
Mean duration 1.6708706666666666
Mean duration 1.338372
Mean duration 1.6727043333333331
Mean duration 1.6715573333333333
Mean duration 1.33837

I even tried to run both server and test inside the docker with no difference.

btw when I am running the tests in the loop like you did, it seems like it has the correct response time more often 🤷 but it can be just my feeling

anyway it seems like it will stay a mystery 😁

@clock-workorange
Copy link

don't stop do not surrender, let's try to solve the mystery

maybe a library or something fedora vs ubuntu etc or way is compiled?

try to use another instance etc we need to figure it out why this strange behavior

@jonashaag
Copy link
Owner

I was able to reproduce in a Linux Docker container. So maybe it's a Linux thing. I can spend some time debugging this soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants