Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to Increase async httpconnection limit? #873

Open
ion-elgreco opened this issue May 2, 2024 · 7 comments
Open

How to Increase async httpconnection limit? #873

ion-elgreco opened this issue May 2, 2024 · 7 comments

Comments

@ion-elgreco
Copy link

ion-elgreco commented May 2, 2024

I want to increase the http connection limit to see if I can saturate my network more but I don't see a way on how to pass this through the FileSystem, I went through the code and aiobotocore as well but no luck yet. Increasing the max_connection_pool already helps a bit though which increases io by 2x.

Any suggestions on how to increase the concurrency?

@martindurant
Copy link
Member

There are many levers to pull, actually. How are you setting the pool, what kind of benchmark are you running, and do you have an idea of what your current bottleneck may be caused by? Since fsspec generally maintains its own IO thread/loop, a significant increase in performance is something I'd be happy to bake in.

@ion-elgreco
Copy link
Author

ion-elgreco commented May 2, 2024

@martindurant I am currently passing this to the S3FileSystem: config_kwargs={"max_pool_connections": 50},.

I was checking with iftop what peak transfer rate was, it was just 50Mb out of 1Gbps network capacity (aks -> LakeFS on aks -> azure blob). It took around 15secs to read 6000 txt files. I think it could go faster but not sure :)

@martindurant
Copy link
Member

Would you mind making a graph of max_pool versus throughput? How many files (~ coroutines) are in flight?

@ion-elgreco
Copy link
Author

@martindurant do you have some examples on how to access these things during execution?

@martindurant
Copy link
Member

  • I thought throughput was exactly what you were already measuring
  • The number of files you should be able to get from a normal glob or expand_paths call.
  • You could maybe use callbacks to measure the coroutines, but probably you would need to hack something into maybe fsspec.asyn._runner

@martindurant
Copy link
Member

ping, since this just came up on another thread. @ion-elgreco , have you had a chance to do any more benchmarking or testing?

@ion-elgreco
Copy link
Author

@martindurant hey, I parked improving it further since it worked "good enough"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants