-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Telegraf 1.21.0 - go panic with SNMP plugin #10298
Comments
Can you please provide your config as well? |
I have 300+ .conf files representing each device I monitor. The telegraf.conf is pretty basic
|
The config causing the panic will be in one of the snmp configs, so the above doesn't help. It would be very helpful to have a config that reproduces this. The panic itself is from: https://github.com/influxdata/telegraf/blob/master/internal/snmp/translate.go#L153 Looks like it is splitting the @MyaLongmire thoughts? |
Please post any the oids you are monitoring that have the |
If you can run the artifacts from this pr it should error instead of panic and tell you which mib it is having trouble parsing. |
I don't know if this is the same issue, but I see similiarities and thought i would post it here. Let me know though if I should post it as a new issue, or if you would like more information. This same telegraf.conf has been used for around a year without any issue with the previous telegraf versions. Thanks for you help and working on a great product. Telegraph installation on a raspberryPi running Ubuntu. Relevant logs:
I'm only using the snmp plugin to monitor a ubiquity EdgeRouterX, and here are the relevant log: ##
## EdgeRouter devices
##
[[inputs.snmp]]
# List of agents to poll
agents = [ "192.168.x.x"]
# Polling interval
interval = "60s"
# Timeout for each SNMP query.
timeout = "5s"
# Number of retries to attempt within timeout.
retries = 3
# SNMP version, values can be 1, 2, or 3
version = 2
# SNMP community string.
community = "home-erx"
# The GETBULK max-repetitions parameter
max_repetitions = 50
# Measurement name
name = "snmp.EdgeOS"
##
## Exclusions
##
# Don't want these columns from UCD-SNMP-MIB::laTable
fielddrop = [ "laErrorFlag", "laErrMessage" ]
# Don't want these rows from UCD-DISKIO-MIB::diskIOTable
[inputs.snmp.tagdrop]
diskIODevice = [ "loop*", "ram*" ]
##
## System details
##
# System name (hostname)
[[inputs.snmp.field]]
name = "sysName"
oid = "SNMPv2-MIB::sysName.0"
is_tag = true
# System vendor OID
[[inputs.snmp.field]]
name = "sysObjectID"
oid = "SNMPv2-MIB::sysObjectID.0"
# System description
[[inputs.snmp.field]]
name = "sysDescr"
oid = "SNMPv2-MIB::sysDescr.0"
# System contact
[[inputs.snmp.field]]
name = "sysContact"
oid = "SNMPv2-MIB::sysContact.0"
# System location
[[inputs.snmp.field]]
name = "sysLocation"
oid = "SNMPv2-MIB::sysLocation.0"
##
## Host/System Resources
##
# System uptime
[[inputs.snmp.field]]
name = "sysUpTime"
oid = "HOST-RESOURCES-MIB::hrSystemUptime.0"
# Number of user sessions
[[inputs.snmp.field]]
name = "hrSystemNumUsers"
oid = "HOST-RESOURCES-MIB::hrSystemNumUsers.0"
# Number of process contexts
[[inputs.snmp.field]]
name = "hrSystemProcesses"
oid = "HOST-RESOURCES-MIB::hrSystemProcesses.0"
# Device Listing
[[inputs.snmp.table]]
oid = "HOST-RESOURCES-MIB::hrDeviceTable"
[[inputs.snmp.table.field]]
oid = "HOST-RESOURCES-MIB::hrDeviceIndex"
is_tag = true
##
## Context Switches & Interrupts
##
# Number of interrupts processed
[[inputs.snmp.field]]
name = "ssRawInterrupts"
oid = "UCD-SNMP-MIB::ssRawInterrupts.0"
# Number of context switches
[[inputs.snmp.field]]
name = "ssRawContexts"
oid = "UCD-SNMP-MIB::ssRawContexts.0"
##
## Host performance metrics
##
# System Load Average
[[inputs.snmp.table]]
oid = "UCD-SNMP-MIB::laTable"
[[inputs.snmp.table.field]]
oid = "UCD-SNMP-MIB::laNames"
is_tag = true
##
## CPU inventory
##
# Processor listing
[[inputs.snmp.table]]
index_as_tag = true
oid = "HOST-RESOURCES-MIB::hrProcessorTable"
##
## CPU utilization
##
# Number of 'ticks' spent on user-level
[[inputs.snmp.field]]
name = "ssCpuRawUser"
oid = "UCD-SNMP-MIB::ssCpuRawUser.0"
# Number of 'ticks' spent on reduced-priority
[[inputs.snmp.field]]
name = "ssCpuRawNice"
oid = "UCD-SNMP-MIB::ssCpuRawNice.0"
# Number of 'ticks' spent on system-level
[[inputs.snmp.field]]
name = "ssCpuRawSystem"
oid = "UCD-SNMP-MIB::ssCpuRawSystem.0"
# Number of 'ticks' spent idle
[[inputs.snmp.field]]
name = "ssCpuRawIdle"
oid = "UCD-SNMP-MIB::ssCpuRawIdle.0"
# Number of 'ticks' spent waiting on I/O
[[inputs.snmp.field]]
name = "ssCpuRawWait"
oid = "UCD-SNMP-MIB::ssCpuRawWait.0"
# Number of 'ticks' spent in kernel
[[inputs.snmp.field]]
name = "ssCpuRawKernel"
oid = "UCD-SNMP-MIB::ssCpuRawKernel.0"
# Number of 'ticks' spent on hardware interrupts
[[inputs.snmp.field]]
name = "ssCpuRawInterrupt"
oid = "UCD-SNMP-MIB::ssCpuRawInterrupt.0"
# Number of 'ticks' spent on software interrupts
[[inputs.snmp.field]]
name = "ssCpuRawSoftIRQ"
oid = "UCD-SNMP-MIB::ssCpuRawSoftIRQ.0"
##
## System Memory (physical/virtual)
##
# Size of phsyical memory (RAM)
[[inputs.snmp.field]]
name = "hrMemorySize"
oid = "HOST-RESOURCES-MIB::hrMemorySize.0"
# Size of real/phys mem installed
[[inputs.snmp.field]]
name = "memTotalReal"
oid = "UCD-SNMP-MIB::memTotalReal.0"
# Size of real/phys mem unused/avail
[[inputs.snmp.field]]
name = "memAvailReal"
oid = "UCD-SNMP-MIB::memAvailReal.0"
# Total amount of mem unused/avail
[[inputs.snmp.field]]
name = "memTotalFree"
oid = "UCD-SNMP-MIB::memTotalFree.0"
# Size of mem used as shared memory
[[inputs.snmp.field]]
name = "memShared"
oid = "UCD-SNMP-MIB::memShared.0"
# Size of mem used for buffers
[[inputs.snmp.field]]
name = "memBuffer"
oid = "UCD-SNMP-MIB::memBuffer.0"
# Size of mem used for cache
[[inputs.snmp.field]]
name = "memCached"
oid = "UCD-SNMP-MIB::memCached.0"
##
## Block (Disk) performance
##
# System-wide blocks written
[[inputs.snmp.field]]
name = "ssIORawSent"
oid = "UCD-SNMP-MIB::ssIORawSent.0"
# Number of blocks read
[[inputs.snmp.field]]
name = "ssIORawReceived"
oid = "UCD-SNMP-MIB::ssIORawReceived.0"
# Per-device (disk) performance
[[inputs.snmp.table]]
oid = "UCD-DISKIO-MIB::diskIOTable"
[[inputs.snmp.table.field]]
oid = "UCD-DISKIO-MIB::diskIODevice"
is_tag = true
##
## Disk/Partition/Filesystem inventory & usage
##
# Storage listing
[[inputs.snmp.table]]
oid = "HOST-RESOURCES-MIB::hrStorageTable"
[[inputs.snmp.table.field]]
oid = "HOST-RESOURCES-MIB::hrStorageDescr"
is_tag = true
##
## Interface metrics
##
# Per-interface traffic, errors, drops
[[inputs.snmp.table]]
oid = "IF-MIB::ifTable"
[[inputs.snmp.table.field]]
oid = "IF-MIB::ifName"
is_tag = true
# Per-interface high-capacity (HC) counters
[[inputs.snmp.table]]
oid = "IF-MIB::ifXTable"
[[inputs.snmp.table.field]]
oid = "IF-MIB::ifName"
is_tag = true
##
## IP metrics
##
# System-wide IP metrics
[[inputs.snmp.table]]
index_as_tag = true
oid = "IP-MIB::ipSystemStatsTable"
##
## ICMP Metrics
##
# ICMP statistics
[[inputs.snmp.table]]
index_as_tag = true
oid = "IP-MIB::icmpStatsTable"
# ICMP per-type statistics
[[inputs.snmp.table]]
index_as_tag = true
oid = "IP-MIB::icmpMsgStatsTable"
##
## UDP statistics
##
# Datagrams delivered to app
[[inputs.snmp.field]]
name = "udpInDatagrams"
oid = "UDP-MIB::udpInDatagrams.0"
# Datagrams received with no app
[[inputs.snmp.field]]
name = "udpNoPorts"
oid = "UDP-MIB::udpNoPorts.0"
# Datagrams received with error
[[inputs.snmp.field]]
name = "udpInErrors"
oid = "UDP-MIB::udpInErrors.0"
# Datagrams sent
[[inputs.snmp.field]]
name = "udpOutDatagrams"
oid = "UDP-MIB::udpOutDatagrams.0"
##
## TCP statistics
##
# Number of CLOSED -> SYN-SENT transitions
[[inputs.snmp.field]]
name = "tcpActiveOpens"
oid = "TCP-MIB::tcpActiveOpens.0"
# Number of SYN-RCVD -> LISTEN transitions
[[inputs.snmp.field]]
name = "tcpPassiveOpens"
oid = "TCP-MIB::tcpPassiveOpens.0"
# Number of SYN-SENT/RCVD -> CLOSED transitions
[[inputs.snmp.field]]
name = "tcpAttemptFails"
oid = "TCP-MIB::tcpAttemptFails.0"
# Number of ESTABLISHED/CLOSE-WAIT -> CLOSED transitions
[[inputs.snmp.field]]
name = "tcpEstabResets"
oid = "TCP-MIB::tcpEstabResets.0"
# Number of ESTABLISHED or CLOSE-WAIT
[[inputs.snmp.field]]
name = "tcpCurrEstab"
oid = "TCP-MIB::tcpCurrEstab.0"
# Number of segments received
[[inputs.snmp.field]]
name = "tcpInSegs"
oid = "TCP-MIB::tcpInSegs.0"
# Number of segments sent
[[inputs.snmp.field]]
name = "tcpOutSegs"
oid = "TCP-MIB::tcpOutSegs.0"
# Number of segments retransmitted
[[inputs.snmp.field]]
name = "tcpRetransSegs"
oid = "TCP-MIB::tcpRetransSegs.0"
# Number of segments received with error
[[inputs.snmp.field]]
name = "tcpInErrs"
oid = "TCP-MIB::tcpInErrs.0"
# Number of segments sent w/RST
[[inputs.snmp.field]]
name = "tcpOutRsts"
oid = "TCP-MIB::tcpOutRsts.0"
##
## IP routing statistics
##
# Number of valid routing entries
[[inputs.snmp.field]]
name = "inetCidrRouteNumber"
oid = "IP-FORWARD-MIB::inetCidrRouteNumber.0"
# Number of valid entries discarded
[[inputs.snmp.field]]
name = "inetCidrRouteDiscards"
oid = "IP-FORWARD-MIB::inetCidrRouteDiscards.0"
# Number of valid forwarding entries
[[inputs.snmp.field]]
name = "ipForwardNumber"
oid = "IP-FORWARD-MIB::ipForwardNumber.0"
##
## IP routing statistics
##
# Number of valid routes discarded
[[inputs.snmp.field]]
name = "ipRoutingDiscards"
oid = "RFC1213-MIB::ipRoutingDiscards.0"
##
## SNMP metrics
##
# Number of SNMP messages received
[[inputs.snmp.field]]
name = "snmpInPkts"
oid = "SNMPv2-MIB::snmpInPkts.0"
# Number of SNMP Get-Request received
[[inputs.snmp.field]]
name = "snmpInGetRequests"
oid = "SNMPv2-MIB::snmpInGetRequests.0"
# Number of SNMP Get-Next received
[[inputs.snmp.field]]
name = "snmpInGetNexts"
oid = "SNMPv2-MIB::snmpInGetNexts.0"
# Number of SNMP objects requested
[[inputs.snmp.field]]
name = "snmpInTotalReqVars"
oid = "SNMPv2-MIB::snmpInTotalReqVars.0"
# Number of SNMP Get-Response received
[[inputs.snmp.field]]
name = "snmpInGetResponses"
oid = "SNMPv2-MIB::snmpInGetResponses.0"
# Number of SNMP messages sent
[[inputs.snmp.field]]
name = "snmpOutPkts"
oid = "SNMPv2-MIB::snmpOutPkts.0"
# Number of SNMP Get-Request sent
[[inputs.snmp.field]]
name = "snmpOutGetRequests"
oid = "SNMPv2-MIB::snmpOutGetRequests.0"
# Number of SNMP Get-Next sent
[[inputs.snmp.field]]
name = "snmpOutGetNexts"
oid = "SNMPv2-MIB::snmpOutGetNexts.0"
# Number of SNMP Get-Response sent
[[inputs.snmp.field]]
name = "snmpOutGetResponses"
oid = "SNMPv2-MIB::snmpOutGetResponses.0" |
I use these MIBs I can try the custom build monday, /Henrik |
We have this panic on a similar mib, unfortunately, it is an issue in the new |
I have tried to implement what the owner of the library recommended. If you wouldn't mind testing this draft pr and giving some feedback, it would be greatly appreciated :) |
Same problem with v1.21.1, both in arm (Raspberry) and amd64 (Ubuntu) |
@jordipalet did you try the pr listed above? |
@MyaLongmire it still appears to panic with your PR build at least for me (using the synology mibs).
Attached is the MIB it seems to be failing on. |
@dnewsholme Thank you for attaching your mib! I will try my best to figure out the issue. |
Similar issue here with the SNMP polling. I have not updated to the pr listed above, yet. Dec 20 18:54:35 pi3b systemd[1]: telegraf.service: Main process exited, code=exited, status=2/INVALIDARGUMENT |
@dnewsholme I see the mibs you posted are not the synology mibs. Would you mind posting your config that goes with the mibs you posted? |
@MyaLongmire You are correct looking at the issue you posted. It seem telegraf was loading the mibs from the ietf bundle. I've now removed all the RFC ones and only have the MIBs specified in my config loading and do get a different error and i'm unable to see the problem MIB.
Attached is the config and the mibs |
Thank you for attaching your files. I will investigate and get back to you :) |
I ran your config with your mibs. I had to comment out |
Due to the holidays coming up the team will not have time to have another bug release. There are a few options while you wait for this fix:
Thank you for your continued support of Telegraf and for your patience with this switch to |
It would be nice if this were left open til the release, so that we would get notifications when the release happens. |
Could someone share with me how to roll back on Debian? I've tried a few different ways and not having any luck... sudo apt-get install telegraf_1.20.4-1 This is the error I receive for each one of these: Reading package lists... Done Thank you! |
Looks like you might not have the repository added for telegraf? Easiest way would be to download the .deb and install via dpkg. wget https://dl.influxdata.com/telegraf/releases/telegraf_1.20.4-1_amd64.deb
dpkg -i telegraf_1.20.4-1_amd64.deb If you would prefer to manage via apt make sure you add the source. https://docs.influxdata.com/telegraf/v1.21/introduction/installation/#ubuntu--debian |
Life saver! Thank you! |
curl -LO -C - https://dl.influxdata.com/telegraf/releases/telegraf_1.20.4-1_armhf.deb |
v1.21.2 which contains numerous SNMP fixes is now out. Thanks! |
thanks @MyaLongmire & @powersj , just updated on my Pi (again from 1.20.4) and it looks good. |
Even with this version, I'm still getting "parse module" errors.
Is there something additional to be updated? |
Hi @er-rhorman , I have the same but at least the plugin keeps running now, which is a great improvement compared to the earlier version. And, at least, I don't need these two MIB files. |
While I can see it fetching data (via SNMP), I can see it failing when Grafana calls InfluxDB for the data. When I roll back to 1.20.4, everything works. Here is my output:
When I roll back, this is what the output is and it works without issue. 2022-01-06T13:21:19Z I! Starting Telegraf 1.20.4
|
In your output above you are running in test mode. This means that outputs are not run and you will not see any data get pushed to InfluxDB:
|
In both examples above, I'm running the command "telegraf --test --config emu-rch1.conf" |
The |
Believe I started to hone in on the issue. InfluxDB is working, but since upgrading Telegraf to 1.21.2, it doesn't seem to be writing to the DB any longer. When I check the DB, the most recent time stamps are not changing. So it seems like Telegraf is not running with its polling interval. |
@er-rhorman feel free to file a new issue with all the necessary details. |
My issue still persists - I am getting a reboot loop with the snmp change in 1.21.2:
my mib dir for this modue: |
ok my formatting skills in github is non-existent.. |
I'd similar problems, so I downgraded to 1.20.4. This works in 1.20.4, but not in the latest version: [[inputs.snmp]]
agents = [ "10.10.10.15" ]
version = 2
community = "public"
# interval = "300s"
timeout = "180s"
retries = 3
name = "Lantronix.SLP"
[[inputs.snmp.field]]
name = "Lantronix.SLP"
oid = "RFC1213-MIB::sysName.0"
is_tag = true
# SLP1-MIB:slp1InfeedTable Input Feed Table
[[inputs.snmp.table]]
name = "InFeedTable"
inherit_tags = [ "Lantronix.SLP" ]
oid = "SLP1-MIB::slp1InfeedTable"
# In Feed ID tag - used to identify the SLP
[[inputs.snmp.table.field]]
name = "InFeedID"
oid = "SLP1-MIB::slp1InfeedID"
is_tag = true
# SLP1-MIB:slp1OutletTable Outlet Table
[[inputs.snmp.table]]
name = "OutletTable"
inherit_tags = [ "Lantronix.SLP" ]
oid = "SLP1-MIB::slp1OutletTable"
# Outlet Index tag - used to identify the Outlet
[[inputs.snmp.table.field]]
name = "OutletID"
oid = "SLP1-MIB::slp1OutletID"
is_tag = true
# SLP1-MIB:slp1TempHumidSensorTable Temperature/Humidity Sensor Table
[[inputs.snmp.table]]
name = "TempHumidSensorTable"
inherit_tags = [ "Lantronix.SLP" ]
oid = "SLP1-MIB::slp1TempHumidSensorTable"
# Sensor Index tag - used to identify the Sensor
[[inputs.snmp.table.field]]
name = "SensorID"
oid = "SLP1-MIB::slp1TempHumidSensorID"
is_tag = true |
yeah I forgot to mention, fixed by downgrading to 1.20.4 as well |
Same for me. Issue persisted in 1.21.2. Downgrading to 1.20.4 fixed it. |
Can you please open a new issue with the boot loop? The warning of a mib not being loaded should not interfere with the running of telegraf. These warnings are because the file could not be found or the mib has syntax issues that prevents it from being parsed correctly. Please refer to this comment from the |
I was seeing similar behavior with 1.21.2 which also was fixed by downgrading to 1.20.4. Would it be crazy to suggest that gosmi be downgraded to a version that doesn't exhibit this behavior? It would seem that its prior behavior (however incorrect) was relied on by multiple people, so I'd consider that more of a feature than a bug. |
FWIW this issue is still VERY much a problem - I've got issues across a wide MIB list using v1.21.3-1 The current fix (for me, and others) was downgrading to 1.20.4-1 on my Ubuntu machine via: curl -LO -C - https://dl.influxdata.com/telegraf/releases/telegraf_1.20.4-1_amd64.deb Be sure to set a hold on telegraf too! sudo apt-mark hold telegraf Once I downgraded, my data was once again making it to Influx & I was able to visualize my data in Grafana again |
Thank you, it was fixed with [pull 10299] |
Relevent telegraf.conf
System info
Telegraf 1.21.1
Docker
No response
Steps to reproduce
Expected behavior
to work like 1.20.4
Actual behavior
crashes
Additional info
Not sure if I have a specific snmp MIB file that causes this, I cannot see any reference in the trace what causes it besides snmptranslate
The text was updated successfully, but these errors were encountered: