-
-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maestral keeps indexing and indexing #133
Comments
Did you forget to attach the logs? It is possible that the internet connection is lost at some point while indexing, which will currently trigger a restart instead of resuming where it left off. Do you see “Connecting...” in the logs just before indexing starts again? This would also explain the seemingly arbitrary number of indexed items. |
Thanks for the reply! I actually realized I forgot to add the logs and was adding as you answered! See below (I've raised the log level to aid debugging). It appears indeed as you describe, except there isn't really a loss in internet connection. Also, for the past (so many) times I looked while debugging, the reset seems to occur at the same place (135910 files). This seems slightly suspicious to me, though suspicious of what I'm not sure. I looked around in the code it appears that lower level errors are not handled at all. I haven't really worked with the dropbox API before so I'm not sure if the cursor remains valid after a connection error. For now, I've added another ...
2020-05-02 08:31:12 maestral.client INFO: Indexing 135411...
2020-05-02 08:32:12 maestral.sync DEBUG: Connecting...
Traceback (most recent call last):
File "/home/kosta/maestral/lib/python3.7/site-packages/urllib3/connectionpool.py", line 426, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/home/kosta/maestral/lib/python3.7/site-packages/urllib3/connectionpool.py", line 421, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib/python3.7/http/client.py", line 1354, in getresponse
response.begin()
File "/usr/lib/python3.7/http/client.py", line 306, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.7/http/client.py", line 267, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/lib/python3.7/socket.py", line 589, in readinto
return self._sock.recv_into(b)
File "/usr/lib/python3.7/ssl.py", line 1071, in recv_into
return self.read(nbytes, buffer)
File "/usr/lib/python3.7/ssl.py", line 929, in read
return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/kosta/maestral/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/home/kosta/maestral/lib/python3.7/site-packages/urllib3/connectionpool.py", line 725, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/home/kosta/maestral/lib/python3.7/site-packages/urllib3/util/retry.py", line 403, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/home/kosta/maestral/lib/python3.7/site-packages/urllib3/packages/six.py", line 735, in reraise
raise value
File "/home/kosta/maestral/lib/python3.7/site-packages/urllib3/connectionpool.py", line 677, in urlopen
chunked=chunked,
File "/home/kosta/maestral/lib/python3.7/site-packages/urllib3/connectionpool.py", line 428, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/home/kosta/maestral/lib/python3.7/site-packages/urllib3/connectionpool.py", line 336, in _raise_timeout
self, url, "Read timed out. (read timeout=%s)" % timeout_value
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='api.dropboxapi.com', port=443): Read timed out. (read timeout=60)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/kosta/maestral/git/maestral/maestral/client.py", line 110, in wrapper
return func(*args, **kwargs)
File "/home/kosta/maestral/git/maestral/maestral/client.py", line 600, in list_folder
more_results = self.dbx.files_list_folder_continue(results[-1].cursor)
File "/home/kosta/maestral/lib/python3.7/site-packages/dropbox/base.py", line 1954, in files_list_folder_continue
None,
File "/home/kosta/maestral/lib/python3.7/site-packages/dropbox/dropbox.py", line 274, in request
timeout=timeout)
File "/home/kosta/maestral/lib/python3.7/site-packages/dropbox/dropbox.py", line 365, in request_json_string_with_retry
timeout=timeout)
File "/home/kosta/maestral/lib/python3.7/site-packages/dropbox/dropbox.py", line 449, in request_json_string
timeout=timeout,
File "/home/kosta/maestral/lib/python3.7/site-packages/requests/sessions.py", line 578, in post
return self.request('POST', url, data=data, json=json, **kwargs)
File "/home/kosta/maestral/lib/python3.7/site-packages/requests/sessions.py", line 530, in request
resp = self.send(prep, **send_kwargs)
File "/home/kosta/maestral/lib/python3.7/site-packages/requests/sessions.py", line 643, in send
r = adapter.send(request, **kwargs)
File "/home/kosta/maestral/lib/python3.7/site-packages/requests/adapters.py", line 529, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='api.dropboxapi.com', port=443): Read timed out. (read timeout=60)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/kosta/maestral/git/maestral/maestral/sync.py", line 2664, in startup_worker
sync.get_remote_folder()
File "/home/kosta/maestral/git/maestral/maestral/sync.py", line 343, in wrapper
res = func(self, *args, **kwargs)
File "/home/kosta/maestral/git/maestral/maestral/sync.py", line 1888, in get_remote_folder
include_deleted=False, limit=500)
File "/home/kosta/maestral/git/maestral/maestral/client.py", line 115, in wrapper
raise ConnectionError('Cannot connect to Dropbox')
ConnectionError: Cannot connect to Dropbox
2020-05-02 08:32:12 maestral.sync INFO: Connecting...
2020-05-02 08:32:12 maestral.sync INFO: Downloading your Dropbox
2020-05-02 08:32:13 maestral.client INFO: Indexing 8...
..... |
Update:
This is the added clause, which is right below the previous except (requests.exceptions.ReadTimeout) as exc:
logger.error("Read timeout")
if self._retry_count >= retry:
self._retry_count = 0
raise exc
import time
secs = min(2**self._retry_count,60)
logger.debug(f'Retrying after sleeping {secs} secs...')
time.sleep(secs)
self._retry_count += 1 Any ideas as to what might be going on? Finally, I'm assuming that my logged error somehow did it but when I now run
Edit: |
Seems related dropbox/dropbox-sdk-python#155 |
Thank you for the investigation! This does seem to be a Dropbox API issue but unlike dropbox/dropbox-sdk-python#155 retrying the call seems to work for you. Did you also try to just increase the requests timeout and reduce the number of items requested per call from 500 to maybe 100 or 50? Intuitively, I would handle this independent of the retry count for In any case, I will try to reproduce this myself and flag the issue with Dropbox again with a bit more information than the original report. |
The error with Maestral uses the logging as follows:
I should probably document this in the developer documentation which currently is limited to an API documentation only and is very much work in progress. |
Thanks for info about logging! I'll just change to It seemed to me that most of the fetches too a few seconds so I tried reducing the timeouts to 5 seconds, but apparently that was too short as the timeouts started to occur much earlier (at around ~20K files IIRC - the logs are gone now, it seems). I did not want to increase them for fear of making the indexing take a very long time. I also tried increasing (also in the hope of reducing indexing time) the limit of items to fetch to 2000 and happily this seems to be enough - I was able to index ~600K files, and they are currently downloading. |
Wow, that is a large Dropbox folder! Are your 600k items nested or in a flat hierarchy? I tried to reproduce the timeout with 150k folders but everything worked well. A warning does not require any exec_info so that should be fine. I am surprised that increasing the limit to 2000 items per call prevents timeouts, I would have expected the individual calls to take longer even if the total time is reduced. But who knows how the Dropbox backend works. For now, I will remove any limits and let Dropbox decide how many items it wants to return per call. And it does make sense to introduce a certain number of retries on timeout, maybe even for the other API calls. |
There is a hierarchy but some folders have quite a few items. I agree that it's not the most intuitive that the timeouts are resolved by increasing rather than decreasing the item count. Since the timeouts always occurred roughly at the same place, I guess that those might be related to the total time or total amount of requests in handling the listing. If this is indeed the case, you may be able to reproduce the timeouts by reducing the items limits to e.g. 100. Retrying on connection errors seems like the right thing to do... |
I agree, it is likely that limiting the number of returned items somehow involves multiple requests on Dropbox’s backend therefore can more easily time out. This should hopefully be fixed now in the new release, where timeouts are handled more gracefully and there is no limit on the number of returned entries. However, feel free to reopen the issue if it persists. |
Great, thanks! |
An initial Sync appeared to work, until at some point I ran
maestral status
and got an error telling me to restart.Since then maestral appears to keep indexing until crashing and restarting. I've set logging to
DEBUG
but I don't seem to get extra output. Here is a sample of the log. The same appears to repeat over and over, except that the indexed amount changes slightly before it restarts.Some of the time, the amount of files changes drastically, e.g. it indexed ~300K at one time, and ~100K at another. However, the amount seems to be relatively stable.
System:
I'm using the version from pypi. looking at the git log, it doesn't appear that there were changes that would affect this issue since the release, but I'll try it and will report.
The text was updated successfully, but these errors were encountered: