Irssi uses 100% CPU time #75

Crazyguy · 2014-04-28T04:23:12Z

Hello,

I have this issue where Irssi starts using 100% CPU. The time seems to correlate with the following messages from the Twirssi debug log:

03:21:08 'timeline' => '460572445892571136'03:21:08 }
03:21:08 };
03:26:03 get_updates starting upd=1
03:26:03 bg_setup starting upd=1
03:26:08 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 0
03:26:13 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:26:18 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:26:23 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:26:28 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:26:33 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:26:38 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:26:43 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:26:48 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:26:53 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:26:58 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:03 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:08 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:13 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:18 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:23 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:28 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:33 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:38 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:43 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:48 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:53 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:58 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:28:03 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:28:08 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:28:08 Giving up on polling /tmp/tw_9366_Kbf6.done
03:31:03 get_updates starting upd=1
03:31:03 bg_setup starting upd=1
03:31:08 checking child log at /tmp/tw_9366_c3jr.done mtime=1398645063 v 0
03:31:13 checking child log at /tmp/tw_9366_c3jr.done no /tmp/tw_9366_c3jr.19390 v 1398645063
03:31:13 +063 teromiettinen@Twitter Loading home_timeline page 1
03:31:13 +064 page: $VAR1 = {};

gedge · 2014-04-28T20:42:16Z

How long have you been running twirssi, and how long has this problem been evident?

(heard a few, recent reports from others, on this subject)

gedge · 2014-04-28T20:47:48Z

are you on debian?

Crazyguy · 2014-04-28T20:59:12Z

The problem has occurred for maybe a year (I have been using Twirssi for about 3 years). The issue starts after anywhere from a day to a month of Irssi uptime. I'm not sure if computer uptime makes a difference.

Yes, I've been using Debian stable.

mmitch · 2014-04-28T21:22:29Z

Same here:

first occurence of the problem could be a year ago
problem occurs randomly - sometimes after weeks of irssi uptime, sometimes after a day. Lately, more a day than a month that's why I'm starting to look into the issue :-)
Debian stable

I'm currently checking two things (at once :-)

put the waitpid( -1, WNOHANG) into a loop like seen on perldoc -f waitpid – if there are multiple dead childs, only one will be reaped at a time
remove the calls to Irssi::pidwait_remove() because the irssi documentation makes it sound like without this call children will be reaped automatically (quote from http://www.irssi.org/documentation/perl : it is usually not necessary to call this function

mmitch · 2014-04-28T21:33:30Z

Additional info: The child process seems to be stuck in an endless loop talking to twitter:

% strace -p 26734 2>&1 | head -10

Process 26734 attached - interrupt to quit
read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable)

% lsof -n -p 26734
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
[...]
irssi 26734 mitch 11u REG 0,18 0 676486626 /tmp/tw_8171_lX8K.26734
irssi 26734 mitch 12u IPv4 676487523 0t0 TCP 87.175.121.166:34436->199.16.156.199:https (CLOSE_WAIT)
irssi 26734 mitch 13u IPv4 676488332 0t0 TCP 87.175.121.166:34465->199.16.156.199:https (ESTABLISHED)
[...]

% dig -x 199.16.156.199 +nocomment

; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> -x 199.16.156.199 +nocomment
;; global options: +cmd
;199.156.16.199.in-addr.arpa. IN PTR
156.16.199.in-addr.arpa. 12 IN SOA ns1.p34.dynect.net. root.twitter.com. 7 3600 600 604800 60
;; Query time: 0 msec
;; SERVER: 192.168.0.105#53(192.168.0.105)
;; WHEN: Fri Apr 25 19:37:25 2014
;; MSG SIZE rcvd: 115

Crazyguy · 2014-04-29T04:53:11Z

Is it bad if I just kill the process as a workaround?

mmitch · 2014-04-30T18:19:01Z

You can safely kill the processes, but you might accumulate zombies that way. At least the 100% CPU should be gone. A workaround is incoming :-)

mmitch · 2014-04-30T19:17:27Z

For my workaround see #76
The patch should be verified and tested - on my box, it has already successfully prevented two runaway child processes today. This looks something like this:

10:42 ERROR: Giving up on polling=/tmp/tw_19010_7yvj.done child_pid=28336 parent_pid=19010
10:42 ERROR: remove_child: child was still running, killing it now! child_pid=28336 parent_pid=19010
10:42 ERROR: Haven't been able to get updated tweets since 10:35
10:47 UPDATE: Update succeeded.

mmitch · 2014-04-30T19:34:25Z

The root cause of the 100% CPU load seems to be a bug somwhere in the LWP modules, see https://rt.cpan.org/Public/Bug/Display.html?id=78920

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Irssi uses 100% CPU time #75

Irssi uses 100% CPU time #75

Crazyguy commented Apr 28, 2014

gedge commented Apr 28, 2014

gedge commented Apr 28, 2014

Crazyguy commented Apr 28, 2014

mmitch commented Apr 28, 2014

mmitch commented Apr 28, 2014

Crazyguy commented Apr 29, 2014

mmitch commented Apr 30, 2014

mmitch commented Apr 30, 2014

mmitch commented Apr 30, 2014

Irssi uses 100% CPU time #75

Irssi uses 100% CPU time #75

Comments

Crazyguy commented Apr 28, 2014

gedge commented Apr 28, 2014

gedge commented Apr 28, 2014

Crazyguy commented Apr 28, 2014

mmitch commented Apr 28, 2014

mmitch commented Apr 28, 2014

Crazyguy commented Apr 29, 2014

mmitch commented Apr 30, 2014

mmitch commented Apr 30, 2014

mmitch commented Apr 30, 2014