Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

harbour-oc-daemon: High CPU usage / I/O load after network switch #46

Open
nephros opened this issue Sep 4, 2020 · 6 comments
Open

Comments

@nephros
Copy link

nephros commented Sep 4, 2020

From time to time the daemon goes berserk and causes very high CPU usage.
It also spams the log/journal, causing journald to show high load as well.

This of course causes tremendous battery drain (up to 20%/h according to sysmon).

Stopping/restarting the daemon does fix the CPU load for the daemon, but journald still seems confused after doing that.
Restarting journald as well puts things back to normal.

Environment:

  • SailfishOS Version: 3.3.0.16 (Rokua)
  • Device: Sony Xperia 10 Dual SIM
  • SW versions: harbour-owncloud-0.9.5-1, harbour-owncloud-daemon-0.9.5-1

Steps to reproduce:

  1. enable harbour-owncloud-daemon.service
  2. Set it to sync over mobile network
  3. be connected to mobile network
  4. turn off mobile network
  5. turn on WLAN

At least I can trigger it sometimes using this procedure.

Journal is then filled with the following, about 300 times per second (!):

Sep 04 12:23:47 Sailfish harbour-owncloud-daemon[15523]: [W] unknown:0 - QIODevice::read (QDisabledNetworkReply): device not open
Sep 04 12:23:47 Sailfish harbour-owncloud-daemon[15523]: [W] unknown:0 - QIODevice::read (QDisabledNetworkReply): device not open
Sep 04 12:23:47 Sailfish harbour-owncloud-daemon[15523]: [W] unknown:0 - QIODevice::read (QDisabledNetworkReply): device not open
Sep 04 12:23:47 Sailfish harbour-owncloud-daemon[15523]: [W] unknown:0 - QIODevice::read (QDisabledNetworkReply): device not open
Sep 04 12:23:47 Sailfish harbour-owncloud-daemon[15523]: [W] unknown:0 - QIODevice::read (QDisabledNetworkReply): device not open
Sep 04 12:23:47 Sailfish harbour-owncloud-daemon[15523]: [W] unknown:0 - QIODevice::read (QDisabledNetworkReply): device not open

Top shows this:

top - 12:26:53 up 22:46,  4 users,  load average: 3.11, 2.19, 2.04
Tasks: 683 total,   3 running, 680 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni, 99.3 id,  0.7 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   2672.6 total,    197.1 free,   1868.1 used,    607.3 buff/cache
MiB Swap:   1024.0 total,    853.3 free,    170.7 used.    873.2 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 7955 root      20   0   10240   4792   4416 R  98.4   0.2   6:29.70 systemd-journal
15523 nemo      20   0   67352  16324  12156 R  91.2   0.6   4:37.13 harbour-ownclou
21903 nemo      30  10   12956   3852   2760 R   2.0   0.1   0:00.30 top
@fredldotme
Copy link
Owner

I wonder why this is happening, is this an invalidated/closed running connection the common code doesn't handle right yet?

We do abort running sync operations in the daemon, I guess there's more needed.
https://github.com/fredldotme/harbour-owncloud/blob/master/src/daemon/main.cpp#L136

Will take a look at it when I find the time to do so, might not be within the next month.

@nephros
Copy link
Author

nephros commented Sep 6, 2020

I wonder why this is happening, is this an invalidated/closed running connection the common code doesn't handle right yet?

We do abort running sync operations in the daemon, I guess there's more needed.
https://github.com/fredldotme/harbour-owncloud/blob/master/src/daemon/main.cpp#L136

Will take a look at it when I find the time to do so, might not be within the next month.

Could it be related to this being a dual-SIM device and the code assuming there can be only one^(TM) mobile connection? (Just a stab in the dark here.)

Oh, and one more thing: the fact that there are 300 log entries per second I think is RateLimitBurst setting for journald. So if we are hitting that rate limit, it is possibly spamming at a much much higher rate than that.

grep RateLim /etc/systemd/journald.conf
RateLimitInterval=30s
RateLimitBurst=300

Here's journald complaining about the log abuse:

zcat log.txt.gz  |grep -i suppre
Sep 04 12:19:20 PGXperia10 systemd-journal[7955]: Suppressed 2505 messages from /user.slice/user-100000.slice
Sep 04 12:19:50 PGXperia10 systemd-journal[7955]: Suppressed 48988 messages from /user.slice/user-100000.slice
Sep 04 12:20:20 PGXperia10 systemd-journal[7955]: Suppressed 464013 messages from /user.slice/user-100000.slice
Sep 04 12:20:50 PGXperia10 systemd-journal[7955]: Suppressed 443019 messages from /user.slice/user-100000.slice
Sep 04 12:21:27 PGXperia10 systemd-journal[7955]: Suppressed 467555 messages from /user.slice/user-100000.slice
Sep 04 12:22:11 PGXperia10 systemd-journal[7955]: Suppressed 441814 messages from /user.slice/user-100000.slice
Sep 04 12:22:59 PGXperia10 systemd-journal[7955]: Suppressed 439300 messages from /user.slice/user-100000.slice
Sep 04 12:23:47 PGXperia10 systemd-journal[7955]: Suppressed 446999 messages from /user.slice/user-100000.slice
Sep 04 12:24:17 PGXperia10 systemd-journal[7955]: Suppressed 473444 messages from /user.slice/user-100000.slice
Sep 04 12:24:54 PGXperia10 systemd-journal[7955]: Suppressed 471786 messages from /user.slice/user-100000.slice
Sep 04 12:25:37 PGXperia10 systemd-journal[7955]: Suppressed 446899 messages from /user.slice/user-100000.slice
Sep 04 12:26:51 PGXperia10 systemd-journal[7955]: Suppressed 409600 messages from /user.slice/user-100000.slice
Sep 04 12:27:42 PGXperia10 systemd-journal[7955]: Suppressed 418954 messages from /user.slice/user-100000.slice
Sep 04 12:28:58 PGXperia10 systemd-journal[7955]: Suppressed 422551 messages from /user.slice/user-100000.slice
Sep 04 12:30:05 PGXperia10 systemd-journal[7955]: Suppressed 408250 messages from /user.slice/user-100000.slice
Sep 04 12:31:19 PGXperia10 systemd-journal[7955]: Suppressed 313741 messages from /user.slice/user-100000.slice
Sep 04 12:32:12 PGXperia10 systemd-journal[7955]: Suppressed 428339 messages from /user.slice/user-100000.slice

I wonder if simply finding a way to not log as much would do away with the IO/CPU load issue, even without finding and fixing the underlying problem.

@fredldotme
Copy link
Owner

If memory serves me right it must be a Qt-internal error message, so fixing the underlying problem is the way to go. It might be related to me switching back to keeping the QWebdav instance around, maybe it should be refreshed again with every CommandQueue task.

Currently the QWebdav instance is only created once, in the WebDav-specific CommandQueue implementation, only whenever settings change (which, in the case of the daemon, is never): https://github.com/fredldotme/harbour-owncloud/blob/master/src/common/src/provider/storage/webdavcommandqueue.cpp#L38

So basically, whenever the pointer is passed through the getWebDav() calls, it should probably create a new instance instead.
Though, measures have to be made to clean up the client when the CommandEntity task is done.

I'll take a stab at it if you don't beat me to it. :)

@fredldotme
Copy link
Owner

@nephros do you think you can follow the build instructions and build a copy of GhostCloud yourself using the Sailfish SDK? I don't have it installed on my machine right now.

@nephros
Copy link
Author

nephros commented Sep 8, 2020

@nephros do you think you can follow the build instructions and build a copy of GhostCloud yourself using the Sailfish SDK? I don't have it installed on my machine right now.

I also do not have the SDK/build environment set up at the moment. So the answer is yes, eventually but not in the too near future.

@nephros
Copy link
Author

nephros commented Sep 14, 2020

@nephros do you think you can follow the build instructions and build a copy of GhostCloud yourself using the Sailfish SDK? I don't have it installed on my machine right now.

So, I have successfully built the RPMs using gitlab CI.

What would you have me do now I can build stuff?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants