Send `std::net:http` requests #7742

scotttrinh · 2024-09-12T14:45:45Z

TODO

Get batches of pending std::net::http::ScheduledRequest objects
Create httpx request task
Run this work queue on server
Add end-to-end tests with test HTTP server

edb/server/net_worker.py

fantix

Overall this looks good!

edb/server/net_worker.py

fantix · 2024-09-12T15:52:32Z

edb/server/net_worker.py

+        response = await http_send(request)
+        request_state = 'Completed'
+        response_status = response.status_code
+        response_body = response.content


Suggested change

response_body = response.content

response_body = await response.read()

content may not be always present, we should use read() here. That leads to an interesting question if it only fails to read the response body - shall we still store the remaining partial response?

read isn't awaitable? It seems like it returns bytes here? Maybe the change is:

Suggested change

response_body = response.content

response_body = response.read()

The documentation doesn't seem to give a really clear picture on how to do what we're trying to do here which is save the response bytes without any extra decoding. Although... Maybe we want it decoded (to account for compression) and the re-encode it so that we're saving something that can be easily to deal with on the consumer's end?

Oh sorry it should be await response.aread().

without any extra decoding

I suppose you meant the textual codecs (UTF-8, ISO-8859-1, etc.) instead of the protocol codes (gzip, deflate, etc.), right? There is the response.aiter_text() but we'll then need to handle the decoding errors - coming back to the previous question of storing partial responses. I'd prefer we just keep the bytes (only decoding the gzip/deflate/... stuff).

Or maybe we can try to decode it into str and - like you said - store encoded bytes with confidence (like with an additional optional field on the Response indicating the confirmed text encoding, or even always re-encoded as UTF-8), or just store the raw bytes if we failed to decode.

I suppose you meant the textual codecs (UTF-8, ISO-8859-1, etc.) instead of the protocol codes (gzip, deflate, etc.), right?

Ahh, I mostly meant compression, not text encodings, but it looks like response.content already handles decompression:

https://www.python-httpx.org/quickstart/#binary-response-content

Any gzip and deflate HTTP response encodings will automatically be decoded for you. If brotlipy is installed, then the brotli response encoding will be supported. If zstandard is installed, then zstd response encodings will also be supported.

Oh sorry it should be await response.aread().

👍

Or maybe we can try to decode it into str and - like you said - store encoded bytes with confidence (like with an additional optional field on the Response indicating the confirmed text encoding, or even always re-encoded as UTF-8), or just store the raw bytes if we failed to decode.

The main problem I was worried about was having consumers have to read a bunch of headers to know how to decompress the bytes into the encoded text, but since that's not really the issue here, I'm personally fine with just storing the raw bytes, and having the consumer decode themselves in what is most likely utf-8. Although, maybe that's relatively hostile since we imagine that these responses will typically only be read at troubleshooting time, and typically via the UI which would be a subpar experience.

On the other hand, I really don't want to make this part of the module super complicated and the simplest thing to actually store is the raw bytes. @1st1 any thoughts here?

edb/server/net_worker.py

fantix · 2024-09-12T16:05:43Z

edb/server/net_worker.py

+
+
+async def http_send(request: ScheduledRequest) -> httpx.Response:
+    async with httpx.AsyncClient() as client:


Maybe we can cache an httpx.AsyncClient per tenant to pool connections, but that also leads to racing cookie issues among concurrent requests. So yeah, this is probably fine.

Is the idea of caching AsyncClient to minimize number of Python object invocations? Is it expensive to instantiate? Or would it reuse some underlying resources?

reuse some underlying resources

Yes, it reuses the TCP/HTTP connections with keep-alive, avoiding some connection establishment time (which we don't care much, because it is supposed to be async and slow). In case of HTTP/2, it could multiplex concurrent requests over the same TCP connection, allowing higher concurrency with minimal resource.

The cost to create an AsyncClient is not that much, Python objects are also cleared away after __aexit__(). However, the cookies store is shared among all requests on the same client, this may or may not be the expected behavior. For example, AsyncClient would follow redirections; if the redirected request depends on the cookie from the initial response, 2 concurrent requests with both redirections on the same domain may step on each other's feet.

In the end of the day, it's still simpler to just use a separate client per request. But I guess in the future we can make it an option/config for the user to decide if needed.

It's handy that the client can handle cookies, but I don't think any user (ourselves included) should depend on something stateful like cookies when using this module. Maybe there is a way to disable cookies altogether?

Oh, one other nice thing we'd get out of sharing a client is that we can set the max connections in the client itself to avoid managing the concurrency directly ourselves.

Maybe there is a way to disable cookies altogether?

encode/httpx#2992 looks like we'll need to hack our own way in.

we can set the max connections in the client itself

Yep, there's also that. And it'll be the hardware max connections in case of HTTP/2 (like the number of FDs I believe), which is something we really want I guess (instead of logical number of concurrent requests).

I think it makes sense to have a null cookie jar, share a single AsyncClient per-tenant, and set the max connections on the client. I'll throw that together and we can iterate from there if we want something more robust.

@fantix ⚡ b7f7731 (#7742)

edb/server/net_worker.py

fantix · 2024-09-12T16:16:56Z

edb/server/net_worker.py

+            await asyncio.sleep(
+                POLLING_INTERVAL.to_microseconds() / 1_000_000.0
+            )


Are we planning to interrupt this sleep with notifications later?

"notifications" meaning a Postgres notify event? I think so! I'm not sure exactly how to set that up, but the end goal is definitely push rather than polling (maybe with a longer poll still happening in the case of missed notifications).

Per-tenant

fantix

Awesome!

edb/server/net_worker.py

For now, treat errors getting the body bytes as a completed request

State is the request's state, and status is the reponse's status code.

fantix

Good to go! 👍

wip Schedule requests

ab84a71

scotttrinh commented Sep 12, 2024

View reviewed changes

edb/server/net_worker.py Outdated Show resolved Hide resolved

fantix reviewed Sep 12, 2024

View reviewed changes

scotttrinh added 3 commits September 12, 2024 15:14

Remove dataclass data validation

7de9d93

Remove unnecessary f-string and fix typo

f227f70

Share client and control concurrency

b7f7731

Per-tenant

fantix approved these changes Sep 12, 2024

View reviewed changes

edb/server/net_worker.py Outdated Show resolved Hide resolved

edb/server/net_worker.py Outdated Show resolved Hide resolved

scotttrinh added 4 commits September 12, 2024 16:09

Save failure to ScheduledRequest on exception

2032738

For now, treat errors getting the body bytes as a completed request

HTTP method enum -> str

1798cf0

Fix typo: status -> state

bbe672e

State is the request's state, and status is the reponse's status code.

Add net_worker to server event loop

a206bd9

scotttrinh force-pushed the 7727-http-worker branch from 8e13f77 to 1dd7aee Compare September 17, 2024 17:24

wip Add test for scheduling request

1eecfaa

scotttrinh force-pushed the 7727-http-worker branch from 1dd7aee to 1eecfaa Compare September 17, 2024 18:18

scotttrinh added 2 commits September 17, 2024 16:07

Cast named tuple through JSON

b914ce2

Convert headers to list of tuples

37d54bb

scotttrinh changed the title ~~wip Send std::net:http requests~~ Send std::net:http requests Sep 18, 2024

scotttrinh marked this pull request as ready for review September 18, 2024 10:23

scotttrinh requested a review from fantix September 18, 2024 10:24

fantix approved these changes Sep 18, 2024

View reviewed changes

scotttrinh merged commit cbff7eb into master Sep 18, 2024
24 of 26 checks passed

scotttrinh deleted the 7727-http-worker branch September 18, 2024 16:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Send `std::net:http` requests #7742

Send `std::net:http` requests #7742

scotttrinh commented Sep 12, 2024 •

edited

Loading

fantix left a comment

fantix Sep 12, 2024

scotttrinh Sep 12, 2024

fantix Sep 12, 2024

scotttrinh Sep 12, 2024

fantix Sep 12, 2024

1st1 Sep 12, 2024

fantix Sep 12, 2024

scotttrinh Sep 12, 2024

scotttrinh Sep 12, 2024

fantix Sep 12, 2024

scotttrinh Sep 12, 2024

scotttrinh Sep 12, 2024

fantix Sep 12, 2024

scotttrinh Sep 12, 2024

fantix left a comment

fantix left a comment

	response_body = response.content
	response_body = await response.read()

	response_body = response.content
	response_body = response.read()



		async def http_send(request: ScheduledRequest) -> httpx.Response:
		async with httpx.AsyncClient() as client:

Send std::net:http requests #7742

Send std::net:http requests #7742

Conversation

scotttrinh commented Sep 12, 2024 • edited Loading

TODO

fantix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fantix left a comment

Choose a reason for hiding this comment

fantix left a comment

Choose a reason for hiding this comment

Send `std::net:http` requests #7742

Send `std::net:http` requests #7742

scotttrinh commented Sep 12, 2024 •

edited

Loading