-
-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow internet connection, parallel uploads time out, entire home network unusable #438
Comments
Thanks for the report! The queue of "uploading" files contains all of the registered file changes but they won't actually all be uploaded in parallel. At present, the number of parallel uploads is determined by the number of CPU cores times 3 and maxes out at 32, so 12 upload threads for a quad-core CPU. In addition, uploading is throttled if CPU usage because too high (the threshold can be changed in the config file). This is because content hashing to determine sync conflicts is quite CPU intensive. I agree, it would be would be good to also limit bandwidth usage. Instead of directly limiting the number of parallel uploads or downloads I'd however rather expose a config value for the max bandwidth usage, similarly to what is available for CPU usage at the moment. For the time being, you should be able to exploit the CPU usage setting for your purposes by reducing the max CPU usage from 20% per core to a lower value, e.g., 2%. The documentation at https://maestral.app/docs/configfile explains how this can be done. This will result in uploads being automatically throttled. Downloads will continue as is since we don't currently perform content hashing during downloads, however most internet providers allow for larger download bandwidth compared to uploads. |
@samschott That makes sense! Thank you for the explanation and the work you're doing on Maestral. If this is something you feel like I could help out with, I'm happy to try to dive into the code and contribute a PR. |
If you are happy to look into this, a PR will be very welcome! Bandwidth limits should apply across all parallel data transfers since a single upload or download can still max out the entire available bandwidth. We therefore ideally want to track total bandwidth usage over a sliding window (e.g., a few seconds) and rate limit chunked uploads or downloads accordingly. The actual uploads and downloads are handled by the Downloads: We currently iterate over the response content in chunks of 8192 bytes. Rate limiting could easily plug in here and throttle / pause between iterations. Uploads: Files that are larger than 5 MB are uploaded in chunks of 5 MB with each chunk being uploaded in a separate post request. That chunk size may already be too large for an effective bandwidth limit. However, I'm not sure if uploading smaller chunks with a larger number of requests is the best way to go here. Maybe we need a custom As you see, it won't be an easy problem. I'm happy to help, especially with questions about the current code base. |
I've been debugging network connectivity issues for several months, and concluded that Maestral is the culprit:
In other words, Maestral makes the network completely unusable on my computer (in addition to crowding out other devices on the network). All of this is with I'd love to see some sort of stop-gap solution within Maestral to prevent the extreme network hogging behaviour, even if it isn't perfect. EDIT: I'm having some luck with |
Indeed, What is needed is indeed explicit limiting of bandwidth usage. Throttling downloads will be easy since the Dropbox SDK supports steaming downloads. Throttling uploads requires a bit more work to do well, because the Python Dropbox SDK does not support streaming uploads due to potential difficulty with retrying failures on non-rewindable streams. This means that the easiest way limit upload speed is to manually pause between individual post requests, each transferring around 4 MB in Maestral's case (this is chosen both for performance and to limit the total number of API calls required for an upload). As a result, upload throttling will likely be spiky with a target bandwidth only achieved on average. There are alternatives, but they require bypassing the Dropbox SDK and therefore will be a lot more work. Are you seeing the most issues with uploads or downloads? |
Filed a feature request with the Dropbox SDK to allow chunked or streaming uploads: dropbox/dropbox-sdk-python#459. If they agree drop the current limitation, this should greatly simplify implementing bandwidth usage limits. |
Although I normally have more trouble with upload speeds, I'm currently setting up a new computer and would really be grateful for download bandwidth throttling in Maestral at the moment. I've spent a long time on workarounds, and I've concluded that I can't really do this properly without support in Maestral itself. |
😍😍😍 I've noticed you've released |
Okay, I was able to sort of copy: https://github.com/samschott/maestral/blob/main/.github/workflows/publish.yml#L18-L26 lgarron@pythagoras ~/C/g/g/s/m/d/m/src (main)> python3 -m maestralbuild bandwidth-limit up 0.75
✓ Upload bandwidth limit set to 0.75 MB/sec.
lgarron@pythagoras ~/C/g/g/s/m/d/m/src (main)> python3 -m maestralbuild bandwidth-limit down 3
✓ Download bandwidth limit set to 3.0 MB/sec.
lgarron@pythagoras ~/C/g/g/s/m/d/m/src (main)> python3 -m maestralbuild stop
Stopping Maestral... [KILLED]
lgarron@pythagoras ~/C/g/g/s/m/d/m/src (main)> python3 -m maestralbuild start
Starting Maestral.../Users/lgarron/Library/Python/3.9/lib/python/site-packages/dropbox/session.py:1: UserWarning: Module maestral was already imported from /Users/lgarron/Library/Python/3.9/lib/python/site-packages/maestral/__init__.py, but /Users/lgarron/Code/git/github.com/samschott/maestral/dist/maestral-1.6.6.dev1/src is being added to sys.path
import pkg_resources
Starting Maestral... [OK] But it seems I can't actually run it directly: > python3 -m maestralbuild activity
Traceback (most recent call last):
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/lgarron/Code/git/github.com/samschott/maestral/dist/maestral-1.6.6.dev1/src/maestralbuild/__main__.py", line 5, in <module>
main(prog_name="maestral")
File "/Users/lgarron/Library/Python/3.9/lib/python/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/Users/lgarron/Library/Python/3.9/lib/python/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/Users/lgarron/Library/Python/3.9/lib/python/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/lgarron/Library/Python/3.9/lib/python/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/lgarron/Library/Python/3.9/lib/python/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/Users/lgarron/Code/git/github.com/samschott/maestral/dist/maestral-1.6.6.dev1/src/maestralbuild/cli/common.py", line 109, in wrapper
return ctx.invoke(f, proxy, *args, **kwargs)
File "/Users/lgarron/Library/Python/3.9/lib/python/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/Users/lgarron/Code/git/github.com/samschott/maestral/dist/maestral-1.6.6.dev1/src/maestralbuild/cli/common.py", line 34, in wrapper
return func(*args, **kwargs)
File "/Users/lgarron/Code/git/github.com/samschott/maestral/dist/maestral-1.6.6.dev1/src/maestralbuild/cli/cli_info.py", line 136, in activity
info = f"{arrow[event.direction]} {event.change_type.name}"
KeyError: <SyncDirection.Down: 'down'> Then I tried placing the built Python code into the existing published app, but macOS won't launch that app (presumably due to the app signature). Edit: Wait, that last Franken-app hack might have worked! There's nothing in the menu bar, but This is probably a crazy way to run the app, but it will make a big difference for me over the next few days. Thanks so much, @samschott! |
Oh my, that should not have worked. Out of curiosity, how did you get it run? Just by double clicking the app or through some other black magic? As you note, modifying the files should invalidate the code signature and result in macOS refusing to launch the app. In any case, I'm glad the bandwidth limit is working well for :) You can also download a pre-release version from https://nightly.link/samschott/maestral-cocoa/actions/artifacts/541569343.zip. |
Yeah, I'm kind of shocked as well!
Thanks! This one works just as well, and has the benefit of having a working menu item. :-D |
This may be unrelated, but since trying out the new pre-release, my brand-new M2 Mac Mini is crashing mysteriously about once a day.
Also, Maestral seems to be running into dozens of sync errors while downloading files around 1GB-10GB in size. I'm not really complaining — I can keep running Maestral until it has download all my backups and stabilizes. Just thought I'd report what I'm seeting. |
@lgarron, what kind of sync errors are you seeing? The large number of wake-ups may be related to throttling and may indeed crash the app (but should not crash the entire OS!). Throttling is implemented by pausing download threads for short amounts of time between each 2 kB downloaded. The duration is determined by the download rate limit, while the number of parallel download threads is currently set to 4 * CPU_CORE_COUNT and up to max of 64 threads. The initial reasoning was that CPU usage due to parallel content hashing would be the limiting factor for thread count. Parallel data transfers now can also be problematic, given that we more regularly pause and wake up threads. The combination of short sleep times and a large number of threads could indeed cause > 150 wakeups per second. Out of interest, how many CPU cores does your shiny new M2 Mac Mini have? (Very short sleep times occur when the bandwidth limit is set close the actual speed of your network connection. Maestral then does not need to sleep for long to stay below the limit, given the time spent for actual transfer of each 2 kB chunk.) The solution will likely be to (1) set an upper bound for the thread count below 64 and (2) not sleep for exceedingly short periods of time. |
After restarting several times, the number of errors has gone down from 42 to 35, so the issue doesn't seem to be deterministic per file. Other sync failures are for similar video files that are also several GB each:
I have ≈50Mbps down and recently been trying 4MB/s (== 32Mbps) for the Maestral download limit. I'm gonna try going to 2MB/s to see if that's any better.
|
Darn. Where you seeing similar corruption errors as well with the stable version of Maestral? The throttling logic, especially for downloads, should cause any data corruption by itself. The wakeup problem is almost definitely caused by throttling, I did not even know that the macOS kernel had such limits. You live and learn :) |
Could you also post the full diagnostic report from which you cited? I suspect that it may not be a crash report but contain a line such as |
Date/Time: 2023-02-06 23:42:01.013 -0800 End time: 2023-02-06 23:42:37.549 -0800 OS Version: macOS 13.2 (Build 22D49) Architecture: arm64e Report Version: 40 Incident Identifier: C3BC2A63-288A-4CF5-9A4B-081A51F02571
Yeah, I have a few such files, and they seem to contain |
Unblocking on a future may be handled as a thread interrupt and some kernels, in particular Darwin, limit how often those are allowed per second. See #438.
Thanks, that is very helpful! Especially the stack traces from Python. My initial suspicion about the regular sleep calls was apparently incorrect, it is the actual socket communication which is causing thread wakeups. This might be either:
|
Thanks! Is there an easy way for me to updated nightly links to test, by any chance? :-D |
Unfortunately not at the moment. Nightly builds are not really "nightly" but created only for each new tag. |
Describe the bug
My internet connection is fairly slow, (about 2mbps up) and as a result, Maestral seems to get stuck in a loop and eat up bandwidth to the point of making the entire internet connection unusable when trying to upload a bunch of files at the same time. After monitoring with
maestral activity
, I noticed that it tries to upload over 30 files in parallel, all of them time out and do not complete, and then it restarts the whole process over again of uploading over 30 files in parallel, getting stuck in this never ending cycle.I've searched the docs to maybe find a way to configure maximum parallel uploads, but it doesn't seem that there is such an option. My current workaround is to move all the files out of the folder, and drop them in one by one so that maestral doesn't try to upload them all in parallel and crash my internet connection.
To Reproduce
Drag a folder of over 100 images into maestral on a slow internet connection.
Expected behaviour
Hopefully would be able to configure max parallel uploads to prevent this kind of problem.
System:
The text was updated successfully, but these errors were encountered: