-
Notifications
You must be signed in to change notification settings - Fork 663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus exporter for SMART and OCP C0 Log Page #2189
Comments
If I understand you correctly, you would like to have a command which fused the output for both log page. I don't think we should mingle the existing commands. Though first we need to figure out if we should solve this on the level of nvme-cli (I suppose this is your question on a roamap). Anyway, couldn't this be something on top of nvme-cli which does the right thing? Technically, we could solve this as yet another plugin. Thoughts? |
I wrote a sample exporter with chatgpt (not very good but works) |
Thanks for the python code, helps to understand what your really need as input for the integration. I think we should first define a schema. Could you append here a complete JSON formatted one? |
Sure, for the POC I just had pretty much everything that wasn't static be a |
There are several github projects out there based on the parsing of the I think that to avoid yet another exporter that is not aligned neither with the NVMe specifications nor with Prometheus:
To calculate things like WAF, that do not have directly an opcode, it could be possible to use Prometheus recording rules. |
Node exporter is great and that would be a good place to add. It would be nice to have the flexibility to enable more logs pages (like the OCP one I mentioned) for predictive failure and health monitoring for large deployments, but even the base NVMe smart-log in node exporter would be awesome. I found the smart exporter for smartmontools doesn't do SAS properly, for example, hence why I want to do right from nvme-cli instead of smartmontools. |
Just to avoid any confusion, the existing nvme collector in node_exporter exposes information gleaned from sysfs, i.e., that which is exposed by the kernel. It is against node_exporter policy to call external binaries. The textfile collector mechanism is exempt from this, since the apps which generate the textfile metrics are not called by node_exporter itself. |
I had this suspect. Probably the exporter should have some sort of alignment with the |
I understand there is no need to add a new output command as the node exporter is not going to call any binaries anyway. So good to close? Or do I miss something? |
I think that @jmhands had a wish list in his comment #2189 (comment) |
I am confused. I understood that the POC in the comment is parsing outputs from Or is there still a general interest for getting such a command for non node exporter setups? If so, I'd say we should define the expected output. I don't know what is needed here, so I need this input first. I think it's possible to implement it without too much hustle (yeah, I know famous last words...). |
We can still parse the output through another exporter like the one created by @jmhands. |
After a bit of pondering, I would like to avoid mixing the different commands output. The If you want a single command which gives you all the necessary information in one go I am fine by introducing e a new command for this (plugin?). And this means we are back to my original question. Do we need this and what would the output look like. |
From my last comment, I didn't like the idea to change the existing low level commands as they are matching with the spec. I don't mind to introduce a user friendly version which provides the summary. As I don't know what you expect, I'd like to hear what is the expected output? Can you come up with something? |
@igaw I did some digging and I don't think there is anything needed from nvme-cli. I think what we want to do is update the nvme textfile collector to issue the command to obtain the ocp log page and format that as needed. I see that on Ubuntu 22.04 the nvme collector is now on by default. I am not sure if the same is true for other modern distros. Let me do some more digging. @jmhands do you have anything to add here? Note that the nvme textfile collector in node exporter does parse the smart-log log page and display its output for scraping. |
Is there any roadmap for native integration for a Prometheus exporter? I saw some changes coming in json, it would be good to align any exporters on a specific format. My suggestion would be to track drives by "sn" with info on "mn" and "fw" from
sudo nvme id-ctrl /dev/nvme1n1 -o json
then have a standard option for exporting statistics to Prometheus fromsudo nvme smart-log /dev/nvme1n1 -o json
and the OCP log page C0.A small issue is the C0 log page isn't available in the older releases, but this should be the most helpful log along with the normal smart-log to be able to calculate WAF for workloads with
"Physical media units written"
. Other data in the OCP log will be useful for tracking fleet health across many NVMe SSDs.works with the latest app image provided
sudo ./nvme-cli-latest-x86_64.AppImage ocp smart-add-log /dev/nvme0n1 -o json
The text was updated successfully, but these errors were encountered: