-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Send std::net:http
requests
#7742
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good!
edb/server/net_worker.py
Outdated
response = await http_send(request) | ||
request_state = 'Completed' | ||
response_status = response.status_code | ||
response_body = response.content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
response_body = response.content | |
response_body = await response.read() |
content
may not be always present, we should use read()
here. That leads to an interesting question if it only fails to read the response body - shall we still store the remaining partial response?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
read
isn't awaitable? It seems like it returns bytes
here? Maybe the change is:
response_body = response.content | |
response_body = response.read() |
The documentation doesn't seem to give a really clear picture on how to do what we're trying to do here which is save the response bytes without any extra decoding. Although... Maybe we want it decoded (to account for compression) and the re-encode it so that we're saving something that can be easily to deal with on the consumer's end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry it should be await response.aread()
.
without any extra decoding
I suppose you meant the textual codecs (UTF-8, ISO-8859-1, etc.) instead of the protocol codes (gzip, deflate, etc.), right? There is the response.aiter_text()
but we'll then need to handle the decoding errors - coming back to the previous question of storing partial responses. I'd prefer we just keep the bytes
(only decoding the gzip/deflate/... stuff).
Or maybe we can try to decode it into str
and - like you said - store encoded bytes with confidence (like with an additional optional field on the Response
indicating the confirmed text encoding, or even always re-encoded as UTF-8), or just store the raw bytes if we failed to decode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose you meant the textual codecs (UTF-8, ISO-8859-1, etc.) instead of the protocol codes (gzip, deflate, etc.), right?
Ahh, I mostly meant compression, not text encodings, but it looks like response.content
already handles decompression:
https://www.python-httpx.org/quickstart/#binary-response-content
Any gzip and deflate HTTP response encodings will automatically be decoded for you. If brotlipy is installed, then the brotli response encoding will be supported. If zstandard is installed, then zstd response encodings will also be supported.
Oh sorry it should be await response.aread().
👍
Or maybe we can try to decode it into str and - like you said - store encoded bytes with confidence (like with an additional optional field on the Response indicating the confirmed text encoding, or even always re-encoded as UTF-8), or just store the raw bytes if we failed to decode.
The main problem I was worried about was having consumers have to read a bunch of headers to know how to decompress the bytes into the encoded text, but since that's not really the issue here, I'm personally fine with just storing the raw bytes, and having the consumer decode themselves in what is most likely utf-8. Although, maybe that's relatively hostile since we imagine that these responses will typically only be read at troubleshooting time, and typically via the UI which would be a subpar experience.
On the other hand, I really don't want to make this part of the module super complicated and the simplest thing to actually store is the raw bytes. @1st1 any thoughts here?
edb/server/net_worker.py
Outdated
|
||
|
||
async def http_send(request: ScheduledRequest) -> httpx.Response: | ||
async with httpx.AsyncClient() as client: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can cache an httpx.AsyncClient
per tenant to pool connections, but that also leads to racing cookie issues among concurrent requests. So yeah, this is probably fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea of caching AsyncClient to minimize number of Python object invocations? Is it expensive to instantiate? Or would it reuse some underlying resources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reuse some underlying resources
Yes, it reuses the TCP/HTTP connections with keep-alive
, avoiding some connection establishment time (which we don't care much, because it is supposed to be async and slow). In case of HTTP/2, it could multiplex concurrent requests over the same TCP connection, allowing higher concurrency with minimal resource.
The cost to create an AsyncClient
is not that much, Python objects are also cleared away after __aexit__()
. However, the cookies store is shared among all requests on the same client, this may or may not be the expected behavior. For example, AsyncClient
would follow redirections; if the redirected request depends on the cookie from the initial response, 2 concurrent requests with both redirections on the same domain may step on each other's feet.
In the end of the day, it's still simpler to just use a separate client per request. But I guess in the future we can make it an option/config for the user to decide if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's handy that the client can handle cookies, but I don't think any user (ourselves included) should depend on something stateful like cookies when using this module. Maybe there is a way to disable cookies altogether?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, one other nice thing we'd get out of sharing a client is that we can set the max connections in the client itself to avoid managing the concurrency directly ourselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there is a way to disable cookies altogether?
encode/httpx#2992 looks like we'll need to hack our own way in.
we can set the max connections in the client itself
Yep, there's also that. And it'll be the hardware max connections in case of HTTP/2 (like the number of FDs I believe), which is something we really want I guess (instead of logical number of concurrent requests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to have a null cookie jar, share a single AsyncClient per-tenant, and set the max connections on the client. I'll throw that together and we can iterate from there if we want something more robust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
await asyncio.sleep( | ||
POLLING_INTERVAL.to_microseconds() / 1_000_000.0 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we planning to interrupt this sleep with notifications later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"notifications" meaning a Postgres notify
event? I think so! I'm not sure exactly how to set that up, but the end goal is definitely push rather than polling (maybe with a longer poll still happening in the case of missed notifications).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
For now, treat errors getting the body bytes as a completed request
State is the request's state, and status is the reponse's status code.
8e13f77
to
1dd7aee
Compare
1dd7aee
to
1eecfaa
Compare
std::net:http
requestsstd::net:http
requests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to go! 👍
TODO
std::net::http::ScheduledRequest
objects