Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telegraf 1.21.1: Marshal error with SNMP PowerNet MIB #10304

Closed
Doridian opened this issue Dec 18, 2021 · 13 comments · Fixed by #10322
Closed

Telegraf 1.21.1: Marshal error with SNMP PowerNet MIB #10304

Doridian opened this issue Dec 18, 2021 · 13 comments · Fixed by #10322
Assignees
Labels
area/snmp bug unexpected problem or unintended behavior regression something that used to work, but is now broken

Comments

@Doridian
Copy link

Doridian commented Dec 18, 2021

Relevent telegraf.conf

[[inputs.snmp]]
  agents = [ "SNIP" ]
  version = 2
  community = "SNIP"
  interval = "60s"
  timeout = "10s"
  retries = 3

  [[inputs.snmp.field]]
    name = "hostname"
    oid = "RFC1213-MIB::sysName.0"
    is_tag = true

  [[inputs.snmp.field]]
    name = "uptime"
    oid = "DISMAN-EXPRESSION-MIB::sysUpTimeInstance"

  [[inputs.snmp.field]]
    name = "model"
    oid = "PowerNet-MIB::upsBasicIdentModel.0"

  [[inputs.snmp.field]]
      name = "name"
      oid = "PowerNet-MIB::upsBasicIdentName.0"

  [[inputs.snmp.field]]
      name = "upsBasicBatteryTimeOnBattery"
      oid = "PowerNet-MIB::upsBasicBatteryTimeOnBattery.0"

  [[inputs.snmp.field]]
      name = "upsAdvBatteryRunTimeRemaining"
      oid = "PowerNet-MIB::upsAdvBatteryRunTimeRemaining.0"

  [[inputs.snmp.field]]
      name = "upsAdvBatteryReplaceIndicator"
      oid = "PowerNet-MIB::upsAdvBatteryReplaceIndicator.0"

  [[inputs.snmp.field]]
      name = "upsHighPrecBatteryCapacity"
      oid = "PowerNet-MIB::upsHighPrecBatteryCapacity.0"
      conversion = "float(1)"

  [[inputs.snmp.field]]
      name = "upsHighPrecBatteryTemperature"
      oid = "PowerNet-MIB::upsHighPrecBatteryTemperature.0"
      conversion = "float(1)"

  [[inputs.snmp.field]]
      name = "upsBasicOutputStatus"
      oid = "PowerNet-MIB::upsBasicOutputStatus.0"

  [[inputs.snmp.field]]
      name = "upsHighPrecOutputLoad"
      oid = "PowerNet-MIB::upsHighPrecOutputLoad.0"
      conversion = "float(1)"

  [[inputs.snmp.field]]
      name = "upsHighPrecOutputEfficiency"
      oid = "PowerNet-MIB::upsHighPrecOutputEfficiency.0"
      conversion = "float(1)"

  [[inputs.snmp.field]]
      name = "upsHighPrecOutputVoltage"
      oid = "PowerNet-MIB::upsHighPrecOutputVoltage.0"
      conversion = "float(1)"

  [[inputs.snmp.field]]
      name = "upsHighPrecInputLineVoltage"
      oid = "PowerNet-MIB::upsHighPrecInputLineVoltage.0"
      conversion = "float(1)"

  [[inputs.snmp.field]]
      name = "upsHighPrecOutputCurrent"
      oid = "PowerNet-MIB::upsHighPrecOutputCurrent.0"
      conversion = "float(1)"

  [[inputs.snmp.field]]
      name = "upsHighPrecOutputEnergyUsage"
      oid = "PowerNet-MIB::upsHighPrecOutputEnergyUsage.0"
      conversion = "float(1)"

  [[inputs.snmp.field]]
      name = "upsAdvOutputActivePower"
      oid = "PowerNet-MIB::upsAdvOutputActivePower.0"
      conversion = "float(0)"

  [[inputs.snmp.field]]
      name = "upsAdvOutputApparentPower"
      oid = "PowerNet-MIB::upsAdvOutputApparentPower.0"
      conversion = "float(0)"

System info

Telegraf 1.21.1

Docker

No response

Steps to reproduce

  1. Put this config in
  2. Wait for it to pull
  3. Observe error 2021-12-18T08:06:00Z E! [inputs.snmp] Error in plugin: agent SNIP: performing get on field model: marshal: marshalPDU: unable to marshal varbind list: unable to marshal OID: Invalid object identifier

Expected behavior

It works

Actual behavior

2021-12-18T08:06:00Z E! [inputs.snmp] Error in plugin: agent SNIP: performing get on field model: marshal: marshalPDU: unable to marshal varbind list: unable to marshal OID: Invalid object identifier

Additional info

This uses the PowerNet MIB from APC's website: https://www.apc.com/shop/us/en/products/PowerNet-MIB-v4-3-2/P-SFPMIB432
There is no errors loading the MIB, and playing around with gosmi's commandline tools and small test programs with it, it seems to be able to translate the descriptions to OIDs correctly.

This used to work before the switch to gosmi. Other MIBs (such as UPS-MIB) work fine.

Also worthy of note if I replace the string representations (PowerNet-MIB::...) with their full numeric representation, these errors vanish

@Doridian Doridian added the bug unexpected problem or unintended behavior label Dec 18, 2021
@Derek-K
Copy link

Derek-K commented Dec 18, 2021

+1!! Just come across this post, I wish I find it sooner and it can save my afternoon of troubleshooting!

I was scratching my head trying to figure out what's wrong as well!!

I found the actual error message is generated by "gosnmp" and I opened a ticket over there

gosnmp/gosnmp#388

@Doridian
Copy link
Author

+1!! Just come across this post, I wish I find it sooner and it can save my afternoon of troubleshooting!

I was scratching my head trying to figure out what's wrong as well!!

I found the actual error message is generated by "gosnmp" and I opened a ticket over there

gosnmp/gosnmp#388

I am unsure if the actual issue is in gosmnp, or in the way telegraf interacts with it, which is why I opened a ticket here first.
Hope there's a reply from a contributor soon. I've went down the rabbit hole trying to poke deeper but I just can't wrap my head around exactly why this fails.

@Derek-K
Copy link

Derek-K commented Dec 18, 2021

@Doridian same here... I went down the rabbit hole trying to figure out what's wrong as well.

This leads me to believe something to do with gosnmp

Yea, I hope someone from the team can take a look and get back to us. For the longest time, I was wondering what's "marshalPDU", in my world PDU stands for Power Distribution Unit and I thought it was some error in the MIB files, and I can't find any "marshal" brand info... 😅

@MyaLongmire
Copy link
Contributor

@Doridian Thank you for noticing this! I am trying my best to get to the bottom of it I will keep you guys updated and appreciate your patience as well as opening a ticket with gosnmp :)

@MyaLongmire MyaLongmire self-assigned this Dec 19, 2021
@srebhan srebhan added the regression something that used to work, but is now broken label Dec 20, 2021
@MyaLongmire
Copy link
Contributor

I do not believe this is an issue with gosnmp but with gosmi. I have opened an issue in the appropriate library asking for some direction on how to solve the issue. As always, thank you for your patience and I will keep you updated on the status :)

@MyaLongmire
Copy link
Contributor

Due to the holidays coming up the team will not have time to have another bug release. There are a few options while you wait for this fix:

  1. Roll back to 1.20.4
  2. Grab a nightly build -> the snmp panic fix has been merged into master but we are still diligently working on the marshal error
  3. clone master

Thank you for your continued support of Telegraf and for your patience with this switch to gosmi!

@MyaLongmire MyaLongmire changed the title Telegraf 1.21.1: Issues with SNMP PowerNet MIB Telegraf 1.21.1: Marshal error with SNMP PowerNet MIB Dec 21, 2021
@MyaLongmire
Copy link
Contributor

The library owner of gosmi got back to me on this issue. The marshal is error is coming from RenderNumber() which should return the oids of mibs. It is returning blank and that is why the marshal error says invalid oid. He explains that if it is returning black that the node is disconnected. I am going to add some error checking so the error will be more clear than the marshal error. If you wouldn't mind making sure you have all of the imported mibs in your path and trying again.

@Doridian
Copy link
Author

@MyaLongmire The odd thing is, I copied some of the gosmi code from telegraf into a standalone script to play around with. If I load that up, it comes back with the OID just fine, just telegraf for some reason does not. Maybe I load the folders in a different order, but the MIBs must exist somehow. I can try fiddling some more to see if I can get a minimal code snippet to actually reproduce the blank-ness.
My PowerNet MIB, I just put it in the OS MIB folder, so it loads with all the other things from snmp-mibs-downloader

@MyaLongmire
Copy link
Contributor

I tried to simplify telegraf code just for testing gosmi. I have attached the code as a txt. It still comes up blank for me. Thank you for helping me test this. I am unsure of why the order would matter but will play with how we traverse the folder and try to come up with a solution. I am also open to any ideas you have as to why this happens.
synology.txt

@Doridian
Copy link
Author

Doridian commented Dec 21, 2021

I finally managed to reproduce the issue. It is order dependant!
See my attached file (I prefixed all the paths to my MIBS with /mnt as this is a minimal test container).
Loading the MIBs in this order (/usr/share first, then /var/lib) produces a blank output.
Loading the MIBs in the other order (/var/lib first, then /usr/share) and I get correct output.
The loading code is copied from telegraf directly.
test.txt
This might be a load-order issue.
If you load PowerNet before the RFC/etc MIBs, it fails, maybe?

@MyaLongmire
Copy link
Contributor

I just found that if you put them in the same folder it works. Any idea on the best order to load them in?

@Doridian
Copy link
Author

I think I fond a possible fix for this issue. I have changed the function so that it first appends all paths to gosmi, and only at the end places theLoadModule calls.
This makes it load successfully in any order

See attached file.
test.txt

@MyaLongmire
Copy link
Contributor

MyaLongmire commented Dec 21, 2021

The pr for this is now up! If you would give it a test just to be sure it works. Thank you so much for all of your help with this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/snmp bug unexpected problem or unintended behavior regression something that used to work, but is now broken
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants