Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/proc/pagetypeinfo: Cannot convert '>100000' to float #1156

Open
dwreski opened this issue Dec 14, 2020 · 13 comments
Open

/proc/pagetypeinfo: Cannot convert '>100000' to float #1156

dwreski opened this issue Dec 14, 2020 · 13 comments

Comments

@dwreski
Copy link

dwreski commented Dec 14, 2020

Hi,
I'm having an issue with what I believe is the meminfo plugin on fedora33 (although this error has been around for ever).

2020/12/14 12:15:18 [ERROR] In RRD: Error updating /var/lib/munin/bwimail03/bwimail03-pagetypeinfo-n0_zNormal_tMovable-fp_n0_zNormal_tMovable_o0-g.rrd: /var/lib/munin/bwimail03/bwimail03-pagetypeinfo-n0_zNormal_tMovable-fp_n0_zNormal_tMovable_o0-g.rrd: Function update_pdp_prep, case DST_GAUGE - Cannot convert '>100000' to float

Maybe it needs to be cast as a "long float" somewhere?

I believe it's with the load_pagetypeinfo function, but I haven't attempted to troubleshoot it fully.

I can paste my /proc/pagetypeinfo contents here, but it's unlikely it would format correctly. I also don't think my file is unique where it would even make a difference.

There are also many lines like this, going back for many years. I don't know if it's related or if I should open another ticket, but I also have no idea how to troubleshoot this or how to obtain more info to troubleshoot it.

2020/12/14 12:35:12 [WARNING] 20 lines had errors while 2483 lines were correct in data from 'config meminfo' on cipher/209.216.11.60/4949

There's a huge amount of output from "munin-run meminfo config" but no obvious errors.

Ideas greatly appreciated, and I'll help to provide as much info as I can.

@sumpfralle
Copy link
Collaborator

Interesting!

Could you show the output of munin-run meminfo, please?

I suspect, that one of the fields contains the literal string >100000.

@dwreski
Copy link
Author

dwreski commented Dec 16, 2020

Here is the output from one system attached here, although it happens on every system. A literal "100000" doesn't appear anywhere in the output or in the meminfo plugin anywhere. Also, "phisical" is spelled incorrectly throughout - it should be "physical"

munin-meminfo.txt

@sumpfralle
Copy link
Collaborator

After taking at your data, I suspect, that the master cannot handle 64 bit integer values. At least I noticed, that around 20 values in your output are bigger than 2^32. Is the master a 32 bit host?

This would help my understanding of the problem. But it should of course be fixed, even if it is a 32 bit system ...

Also, "phisical" is spelled incorrectly throughout - it should be "physical"

Yes, that is an annoyance, but I am hesitant to fix this typo (being visible only in the filenames), since it would break the history for these graphs.

Regarding the line in your log:

2020/12/14 12:35:12 [WARNING] 20 lines had errors while 2483 lines were correct in data from 'config meminfo' on cipher/209.216.11.60/4949

Did you notice other errors or warning just above this one? I assume, that munin should emit a warning for each problematic line.

@sumpfralle
Copy link
Collaborator

sumpfralle commented Dec 16, 2020

If there are no error messages, then please find the file Munin/Master/Node.pm and replace DEBUG with WARN in the following two lines:

  • DEBUG "[DEBUG] Protocol exception: unrecognized line '$line' from $plugin on $nodedesignation.\n";
  • DEBUG "[DEBUG] Protocol exception while fetching '$service' from $plugin on $nodedesignation: unrecognized line '$line'";

The next run of munin-update (every five minutes) should expose these interesting log messages.

@dwreski
Copy link
Author

dwreski commented Dec 16, 2020

This is a 64-bit host running fedora33.

Yes, each host which has the meminfo plugin running reports the same problems. Only the number of errors varies slightly between each host.

We have not noticed any other errors or warnings related to the meminfo plugin.

Here are the results for one host after making the DEBUG/WARN changes above.

2020/12/16 14:29:53 [DEBUG] Protocol exception: unrecognized line 'slab_size_summ_Acpi.info ' from meminfo on arcade/107.155.66.2/4949.
2020/12/16 14:29:53 [DEBUG] Protocol exception: unrecognized line 'slab_size_summ_anon.info ' from meminfo on arcade/107.155.66.2/4949.
2020/12/16 14:29:53 [DEBUG] Protocol exception: unrecognized line 'slab_size_summ_biovec.info ' from meminfo on arcade/107.155.66.2/4949.
2020/12/16 14:29:53 [DEBUG] Protocol exception: unrecognized line 'slab_size_summ_btrfs.info ' from meminfo on arcade/107.155.66.2/4949.
2020/12/16 14:29:53 [DEBUG] Protocol exception: unrecognized line 'slab_size_summ_caches.info ' from meminfo on arcade/107.155.66.2/4949.
2020/12/16 14:29:53 [DEBUG] Protocol exception: unrecognized line 'slab_size_summ_dma.info ' from meminfo on arcade/107.155.66.2/4949.
2020/12/16 14:29:53 [DEBUG] Protocol exception: unrecognized line 'slab_size_summ_dmaengine.info ' from meminfo on arcade/107.155.66.2/4949.
2020/12/16 14:29:53 [DEBUG] Protocol exception: unrecognized line 'slab_size_summ_ext4.info ' from meminfo on arcade/107.155.66.2/4949.
2020/12/16 14:29:53 [DEBUG] Protocol exception: unrecognized line 'slab_size_summ_jbd2.info ' from meminfo on arcade/107.155.66.2/4949.
2020/12/16 14:29:53 [DEBUG] Protocol exception: unrecognized line 'slab_size_summ_kmalloc.info ' from meminfo on arcade/107.155.66.2/4949.
2020/12/16 14:29:53 [DEBUG] Protocol exception: unrecognized line 'slab_size_summ_kmem.info ' from meminfo on arcade/107.155.66.2/4949.
2020/12/16 14:29:53 [DEBUG] Protocol exception: unrecognized line 'slab_size_summ_network.info ' from meminfo on arcade/107.155.66.2/4949.
2020/12/16 14:29:53 [DEBUG] Protocol exception: unrecognized line 'slab_size_summ_other.info ' from meminfo on arcade/107.155.66.2/4949.
2020/12/16 14:29:53 [DEBUG] Protocol exception: unrecognized line 'slab_size_summ_proc.info ' from meminfo on arcade/107.155.66.2/4949.
2020/12/16 14:29:53 [DEBUG] Protocol exception: unrecognized line 'slab_size_summ_request.info ' from meminfo on arcade/107.155.66.2/4949.
2020/12/16 14:29:53 [DEBUG] Protocol exception: unrecognized line 'slab_size_summ_skbuff.info ' from meminfo on arcade/107.155.66.2/4949.
2020/12/16 14:29:53 [DEBUG] Protocol exception: unrecognized line 'slab_size_summ_task.info ' from meminfo on arcade/107.155.66.2/4949.
2020/12/16 14:29:53 [DEBUG] Protocol exception: unrecognized line 'slab_size_summ_xfs.info ' from meminfo on arcade/107.155.66.2/4949.
2020/12/16 14:29:53 [WARNING] 18 lines had errors while 2035 lines were correct in data from 'config meminfo' on arcade/107.155.66.2/4949

@sumpfralle
Copy link
Collaborator

Thanks for the result!

The line errors above are probably not related to the numeric problem, but I would like to fix it anyway. Please share the output of munin-run meminfo config. Then I will be able to fix it, I guess.

Regarding the number conversion problem: the above config dump will also help me to reproduce this.

@dwreski
Copy link
Author

dwreski commented Dec 16, 2020

Requested info attached.
meminfo-config.txt

I also have the following output from the ntp_kernel_pll_freq and ntp_kernel_err plugins. None of these plugins work at all with ntp-4.2.8p15.

2020/12/16 16:35:15 [DEBUG] Protocol exception while fetching 'ntp_kernel_err' from ntp_kernel_err on arcade/107.155.66.2:4949: unrecognized line 'ntp_err.value '
2020/12/16 16:35:15 [WARNING] 1 lines had errors while 0 lines were correct (100.00%) in data from 'fetch ntp_kernel_err' on arcade/107.155.66.2:4949

2020/12/16 16:35:13 [DEBUG] Protocol exception while fetching 'ntp_kernel_pll_freq' from ntp_kernel_pll_freq on arcade/107.155.66.2:4949: unrecognized line 'ntp_pll_freq.value '
2020/12/16 16:35:13 [WARNING] 1 lines had errors while 0 lines were correct (100.00%) in data from 'fetch ntp_kernel_pll_freq' on arcade/107.155.66.2:4949
2020/12/16 16:35:15 [DEBUG] Protocol exception while fetching 'ntp_kernel_err' from ntp_kernel_err on arcade/107.155.66.2:4949: unrecognized line 'ntp_err.value '

@dwreski
Copy link
Author

dwreski commented Feb 12, 2021

still experiencing this issue - anyone have any ideas?

2021/02/11 19:30:18 [ERROR] In RRD: Error updating /var/lib/munin/xavier/xavier-pagetypeinfo-n0_zNormal_tMovable-fp_n0_zNormal_tMovable_o0-g.rrd: /var/lib/munin/xavier/xavier-pagetypeinfo-n0_zNormal_tMovable-fp_n0_zNormal_tMovable_o0-g.rrd: Function update_pdp_prep, case DST_GAUGE - Cannot convert '>100000' to float

sumpfralle added a commit to munin-monitoring/munin that referenced this issue Feb 14, 2021
For example the meminfo plugin emits a few fields without content (just
a trailing space):

  slab_size_summ_Acpi.info
  slab_size_summ_anon.info

In order to simplify plugin writing, it is acceptable to treat these
input lines (with an unambiguous purpose: "no content") as valid.
This reduces the log noise, e.g.:

  [WARNING] 18 lines had errors while 2035 lines were correct in data from ...

See munin-monitoring/contrib#1156 (comment)
@sumpfralle
Copy link
Collaborator

sumpfralle commented Feb 14, 2021

I took another look at the data you provided (the output of fetch and config).
I prepared a dummy plugin emitting this content locally:

#!/bin/sh

case "${1:-fetch}" in
    fetch)
        cat /root/munin-meminfo.txt
        ;;
    config)
        cat /root/meminfo-config.txt
        ;;
esac

Here on my system munin was happy to digest this input (emitting the [WARNING] x lines had errors while y lines were correct log message - just as it did for you).
btw: I fixed the handling of empty fields now (4759e06179d9cf4569448bd8b114b61b551705ed), thus the log noise will go down in the future.

The munin-update procedure ran successfully, the rrd files were created/updated and the graphs were drawn.
I could not find error messages in /var/log/munin/munin-cgi-graph.log or /var/log/munin/munin-graph.log.

Thus it seems, that the same set of input leads to errors on your side and is handled without issues on my side.
Maybe we should compare our environments?

$ uname -a
Linux foo 5.10.0-3-amd64 #1 SMP Debian 5.10.13-1 (2021-02-06) x86_64 GNU/Linux

$ dpkg -l | grep -E "(munin|rrd)" | awk '{print($1, $2, $3)}'
ii librrd8:amd64 1.7.2-3+b7
ii librrds-perl:amd64 1.7.2-3+b7
ii munin 2.0.66-1
ii munin-async 2.0.66-1
ii munin-common 2.0.66-1
ii munin-doc 2.0.66-1
ii munin-node 2.0.66-1
ii munin-plugins-core 2.0.66-1
ii munin-plugins-extra 2.0.66-1
ii rrdtool 1.7.2-3+b7

@dwreski
Copy link
Author

dwreski commented Feb 14, 2021

Hi Lars, thanks for your help. The most obvious difference is that this is on Fedora, not Debian. Nearly the same kernel, though.

# uname -a
Linux foo 5.9.12-200.fc33.x86_64 #1 SMP Wed Dec 2 15:16:37 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

# rpm -qva | grep -E "(munin|rrd)"
rrdtool-perl-1.7.2-14.fc33.x86_64
rrdtool-1.7.2-14.fc33.x86_64
munin-common-2.0.65-1.fc33.noarch
munin-node-2.0.65-1.fc33.noarch
munin-apache-2.0.65-1.fc33.noarch
munin-2.0.65-1.fc33.noarch

This is from the munin system on which the munin script runs. On the munin server that collects from the munin nodes:

# uname -a
Linux foo 5.8.18-200.fc32.x86_64 #1 SMP Mon Nov 2 19:49:11 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

# rpm -qva | grep -E "(munin|rrd)"
munin-node-2.0.65-1.fc32.noarch
rrdtool-1.7.2-7.fc32.x86_64
munin-2.0.65-1.fc32.noarch
munin-common-2.0.65-1.fc32.noarch
rrdtool-perl-1.7.2-7.fc32.x86_64
munin-apache-2.0.65-1.fc32.noarch

@sumpfralle
Copy link
Collaborator

Thanks for your information.
I was hoping for an old version of rrdtool :(

Maybe you could try to delete the rrd files causing these errors and see, whether the problem appears again?
(I am a bit lost, where the problem could be - thus I am just guessing)

@dwreski
Copy link
Author

dwreski commented Feb 14, 2021

Sadly, that didn't fix it. There also aren't any fields in /proc/pagetypeinfo greater than 100,000 that would present a problem converting to float. I do see other references to "Function
update_pdp_prep, case DST_GAUGE - Cannot convert '' to float" with regards to rrdtool, so perhaps it is an rrdtool bug?

I've attached my /proc/pagetypeinfo from one of the systems here.

pagetypeinfo.txt

@github-actions
Copy link

Stale issue message

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants