Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simpleflow workers: proxy mode #439

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

ybastide
Copy link
Contributor

@ybastide ybastide commented Jul 1, 2024

For @jblernout 🙂


Here's the original request:

In Simpleflow, each worker performs a poll for activity.

When N workers run in parallel on the same instance, up to N connections are made to SWF to wait for an activity. If too many instances run in parallel, the maximum number of connections to SWF may be reached, preventing the workers from getting activities.

As a workaround, the number of workers has been limited in each instance. This number could be higher, which would improve the efficiency of each instance, reducing costs.

A proxy could be introduced at the instance level to fix the issue. The proxy would poll for activity on SWF, and the workers would poll for activity on the proxy.

The proxy would implement a simple HTTP server with a single thread (no parallel requests), receive POST requests with the poll details, forward them to SWF, and return SWF responses as an HTTP response.

The worker HTTP client timeout should be set to the same timeout as the SWF client in the proxy (+ some ms for the HTTP request handling).

The worker CLI would be modified to receive the proxy settings:

simpleflow worker.start --domain TestDomain --task-list quickstart --proxy=http://127.0.0.1:4242

A new CLI could be introduced to start the proxy:

simpleflow proxy.start --port=4242

The proxy would have to be started before the workers.


In terms of security, the included proxy is HTTP, meaning that it only sees the CONNECT swf.us-east-1.amazonaws.com:443 HTTP/1.0 (for instance) in cleartext; the actual connection is encrypted. That's because an MITM version needs more setup to pass the CA CERT to boto 🙂.

In terms of code, I mostly

  • removed an unused argument to ConnectedSWFObject.__init__
  • added the proxy
  • added a command line option for using it on workers

I'm defaulting to ::1 for localhost in this PR, but it can be changed.

@ewjoachim
Copy link
Contributor

Can you spare a few words what you're doing here ? This seems extremely low level compared to the rest of this lib, and I'd like to double check in terms of security before we ship this, if possible.

@ybastide
Copy link
Contributor Author

ybastide commented Jul 2, 2024

Can you spare a few words what you're doing here ? This seems extremely low level compared to the rest of this lib, and I'd like to double check in terms of security before we ship this, if possible.

Sure, thanks for the reminder! Updating the description to add context.

@jblernout
Copy link

But if it's a simple HTTP proxy that doesn't do MITM, does that mean it doesn't necessarily reduce the number of connections to SWF? And if, at some point in the future, you want to put a global proxy that decides which worker can poll, without MITM, you can't control which worker does what ?

@ybastide
Copy link
Contributor Author

ybastide commented Jul 3, 2024

No. of simultaneous connections: doesn't the singlethreadness take care of this?
global proxy: wouldn't the source address be enough?

@ybastide ybastide changed the base branch from wip/python-3.12 to improvement/support-python-3.12 July 5, 2024 15:39
@ewjoachim
Copy link
Contributor

ewjoachim commented Jul 6, 2024

No. of simultaneous connections: doesn't the singlethreadness take care of this?

In that case why are we doing a proxy and not just a file-based mutex ? Also, is that going to impact performance ? (how much of a worker's time is spent waiting for the SWF API ? If it's significant, then serializing the API calls might make the worker slower, especially if the aim is to have more workers per container.

I think a file-based mutex is probably enough if we can accept the performance cost. If we want to significantly reduce the number of connections (and potentially speed up simpleflow by avoiding connection time), we should 1/ ensure that we reuse connections between calls of the same process (we probably already do that, but it might be worth double-checking) and 2/ it might be interesting to imagine a kind of proxy like you do but with connection sharing (or a pool). I could imagine multiple ways of doing that, from very low level to high level. It might be worth discussing (with me or any other interested folk)

ybastide added 10 commits July 8, 2024 11:48
Signed-off-by: Yves Bastide <[email protected]>
Signed-off-by: Yves Bastide <[email protected]>
Remove unused `boto3_client` argument.

Signed-off-by: Yves Bastide <[email protected]>
Signed-off-by: Yves Bastide <[email protected]>
Signed-off-by: Yves Bastide <[email protected]>
And remove no-ops --log-level

Signed-off-by: Yves Bastide <[email protected]>
Signed-off-by: Yves Bastide <[email protected]>
Signed-off-by: Yves Bastide <[email protected]>
@ybastide ybastide force-pushed the task/DATA-19619/SWF-Introduce-a-poll-proxy-for-workers branch from 618a3c9 to 8d23671 Compare July 8, 2024 10:00
@ybastide
Copy link
Contributor Author

ybastide commented Jul 8, 2024

In that case why are we doing a proxy and not just a file-based mutex?

Because it looked like the easiest solution 🙂

[Snip many worthwhile points]
It might be worth discussing (with me or any other interested folk)

Let's discuss this when you're back; thanks!

ybastide added 2 commits July 8, 2024 18:42
Signed-off-by: Yves Bastide <[email protected]>
Signed-off-by: Yves Bastide <[email protected]>
@ybastide ybastide force-pushed the improvement/support-python-3.12 branch 3 times, most recently from ed02741 to 44911a3 Compare July 26, 2024 14:21
Base automatically changed from improvement/support-python-3.12 to main July 29, 2024 12:53
@ewjoachim
Copy link
Contributor

(I'm back, btw)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

3 participants