Ceph health messages #31

taam · 2021-11-21T20:14:42Z

For the ceph health check it would be nice to see, which checks are failing. Just as an starting idea, locally I hacked in the following lines at the end of the check_ceph_health function (but my python knowledge is rather limited):

        messages = ", ".join(v['summary']['message'] for k, v in ceph_health.get('checks', {}).items())
        if len(messages) > 0:
            self.check_message += ": " + messages

(Technically it would probably be better to put these details in separate lines.)

The text was updated successfully, but these errors were encountered:

nbuchwitz · 2022-01-01T19:35:36Z

I no longer have a ceph cluster at hand to test this check. How does the output look like on your system? Maybe you can provide same example data directly from the json API, so I can test with?

taam · 2022-01-03T17:35:36Z

Here are some examples (output from pvesh, Proxmox 6.4):

   "health" : {
      "checks" : {},
      "status" : "HEALTH_OK"
   },

   "health" : {
      "checks" : {
         "POOL_SCRUB_FLAGS" : {
            "detail" : [
               {
                  "message" : "Pool foo has noscrub flag"
               },
               {
                  "message" : "Pool foo has nodeep-scrub flag"
               }
            ],
            "severity" : "HEALTH_OK",
            "summary" : {
               "message" : "Some pool(s) have the noscrub, nodeep-scrub flag(s) set"
            }
         }
      },
      "status" : "HEALTH_OK"
   },

   "health" : {
      "checks" : {
         "OSDMAP_FLAGS" : {
            "detail" : [],
            "severity" : "HEALTH_WARN",
            "summary" : {
               "message" : "nobackfill,norebalance,norecover flag(s) set"
            }
         },
         "POOL_SCRUB_FLAGS" : {
            "detail" : [
               {
                  "message" : "Pool foo has noscrub flag"
               },
               {
                  "message" : "Pool foo has nodeep-scrub flag"
               }
            ],
            "severity" : "HEALTH_OK",
            "summary" : {
               "message" : "Some pool(s) have the noscrub, nodeep-scrub flag(s) set"
            }
         }
      },
      "status" : "HEALTH_WARN"
   },

   "health" : {
      "checks" : {
         "PG_DEGRADED" : {
            "detail" : [
               {
                  "message" : "pg 1.0 is stuck undersized for 123.456789, current state active+recovering+undersized+degraded+remapped, last acting [0,2]"
               },
               {
                  "message" : "pg 1.1 is stuck undersized for 123.456789, current state active+recovery_wait+undersized+degraded+remapped, last acting [1,2]"
               }
            ],
            "severity" : "HEALTH_WARN",
            "summary" : {
               "message" : "Degraded data redundancy: 12345/123456789 objects degraded (0.123%), 4 pgs degraded, 5 pgs undersized"
            }
         }
      },
      "status" : "HEALTH_WARN"
   },

nbuchwitz added the enhancement New feature or request label Jan 1, 2022

nbuchwitz self-assigned this Jan 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ceph health messages #31

Ceph health messages #31

taam commented Nov 21, 2021

nbuchwitz commented Jan 1, 2022

taam commented Jan 3, 2022

Ceph health messages #31

Ceph health messages #31

Comments

taam commented Nov 21, 2021

nbuchwitz commented Jan 1, 2022

taam commented Jan 3, 2022