Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Irssi uses 100% CPU time #75

Open
Crazyguy opened this issue Apr 28, 2014 · 9 comments
Open

Irssi uses 100% CPU time #75

Crazyguy opened this issue Apr 28, 2014 · 9 comments

Comments

@Crazyguy
Copy link

Hello,

I have this issue where Irssi starts using 100% CPU. The time seems to correlate with the following messages from the Twirssi debug log:

03:21:08 'timeline' => '460572445892571136'03:21:08 }
03:21:08 };
03:26:03 get_updates starting upd=1
03:26:03 bg_setup starting upd=1
03:26:08 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 0
03:26:13 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:26:18 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:26:23 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:26:28 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:26:33 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:26:38 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:26:43 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:26:48 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:26:53 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:26:58 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:03 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:08 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:13 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:18 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:23 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:28 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:33 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:38 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:43 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:48 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:53 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:58 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:28:03 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:28:08 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:28:08 Giving up on polling /tmp/tw_9366_Kbf6.done
03:31:03 get_updates starting upd=1
03:31:03 bg_setup starting upd=1
03:31:08 checking child log at /tmp/tw_9366_c3jr.done mtime=1398645063 v 0
03:31:13 checking child log at /tmp/tw_9366_c3jr.done no /tmp/tw_9366_c3jr.19390 v 1398645063
03:31:13 +063 teromiettinen@Twitter Loading home_timeline page 1
03:31:13 +064 page: $VAR1 = {};

@gedge
Copy link
Collaborator

gedge commented Apr 28, 2014

How long have you been running twirssi, and how long has this problem been evident?

(heard a few, recent reports from others, on this subject)

@gedge
Copy link
Collaborator

gedge commented Apr 28, 2014

are you on debian?

@Crazyguy
Copy link
Author

The problem has occurred for maybe a year (I have been using Twirssi for about 3 years). The issue starts after anywhere from a day to a month of Irssi uptime. I'm not sure if computer uptime makes a difference.

Yes, I've been using Debian stable.

@mmitch
Copy link
Contributor

mmitch commented Apr 28, 2014

Same here:

  • first occurence of the problem could be a year ago
  • problem occurs randomly - sometimes after weeks of irssi uptime, sometimes after a day. Lately, more a day than a month that's why I'm starting to look into the issue :-)
  • Debian stable

I'm currently checking two things (at once :-)

  • put the waitpid( -1, WNOHANG) into a loop like seen on perldoc -f waitpid – if there are multiple dead childs, only one will be reaped at a time
  • remove the calls to Irssi::pidwait_remove() because the irssi documentation makes it sound like without this call children will be reaped automatically (quote from http://www.irssi.org/documentation/perl : it is usually not necessary to call this function

@mmitch
Copy link
Contributor

mmitch commented Apr 28, 2014

Additional info: The child process seems to be stuck in an endless loop talking to twitter:

% strace -p 26734 2>&1 | head -10

Process 26734 attached - interrupt to quit
read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable)

% lsof -n -p 26734
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
[...]
irssi 26734 mitch 11u REG 0,18 0 676486626 /tmp/tw_8171_lX8K.26734
irssi 26734 mitch 12u IPv4 676487523 0t0 TCP 87.175.121.166:34436->199.16.156.199:https (CLOSE_WAIT)
irssi 26734 mitch 13u IPv4 676488332 0t0 TCP 87.175.121.166:34465->199.16.156.199:https (ESTABLISHED)
[...]

% dig -x 199.16.156.199 +nocomment

; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> -x 199.16.156.199 +nocomment
;; global options: +cmd
;199.156.16.199.in-addr.arpa. IN PTR
156.16.199.in-addr.arpa. 12 IN SOA ns1.p34.dynect.net. root.twitter.com. 7 3600 600 604800 60
;; Query time: 0 msec
;; SERVER: 192.168.0.105#53(192.168.0.105)
;; WHEN: Fri Apr 25 19:37:25 2014
;; MSG SIZE rcvd: 115

@Crazyguy
Copy link
Author

Is it bad if I just kill the process as a workaround?

@mmitch
Copy link
Contributor

mmitch commented Apr 30, 2014

You can safely kill the processes, but you might accumulate zombies that way. At least the 100% CPU should be gone. A workaround is incoming :-)

@mmitch
Copy link
Contributor

mmitch commented Apr 30, 2014

For my workaround see #76
The patch should be verified and tested - on my box, it has already successfully prevented two runaway child processes today. This looks something like this:

10:42 ERROR: Giving up on polling=/tmp/tw_19010_7yvj.done child_pid=28336 parent_pid=19010
10:42 ERROR: remove_child: child was still running, killing it now! child_pid=28336 parent_pid=19010
10:42 ERROR: Haven't been able to get updated tweets since 10:35
10:47 UPDATE: Update succeeded.

@mmitch
Copy link
Contributor

mmitch commented Apr 30, 2014

The root cause of the 100% CPU load seems to be a bug somwhere in the LWP modules, see https://rt.cpan.org/Public/Bug/Display.html?id=78920

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants