Skip to content

gpcdr kernel module

ovis-hpc edited this page Nov 6, 2018 · 4 revisions

Cray's gpcdr kernel module is the source of the aries_linkstatus sampler metrics. It is also the source of Cray Gemini traffic and stall metrics in the cray_gemini_r_sampler. While it can be the source of the Cray Aries traffic and stall metrics in the cray_aries_r_sampler, it is recommended that you turn off these metrics in that sampler and instead get the traffic and stall metrics from the aries_nic_mmr and aries_rtr_mmr_samplers instead, which do not use gpcdr, but rather use the gpcd ioctl interface to read the counter data more efficiently (there is no way to get the linkstatus metrics via the gpcd interface, so gpcdr must be used for those metrics).

Setting up gpcdr on Cray XC40 systems

Install the following rpms (as of this writing, they may be in the smw rpms, but really need to be installed on the computes):

 cray-gni-gpcdr-utils-6.0.25-6.0.5.0_3.28__gd019b74.ari.x86_64.rpm
 cray-gni-gpcdr-utils-man-6.0.25-6.0.5.0_3.28__gd019b74.ari.x86_64.rpm

One time only, you will need to start the service:

 etc/init.d/gpcdr start

This will install the kernel module gpcdr_ari and set up the variable directories. After this, upon reboot the kernel module will automatically be installed and you will not need to run start again.

At this point, you can verify on a compute node that the link metrics are being properly exposed:

 nid00028:/sys/devices/virtual/gni/gpcdr0/metricsets # more linksendstatus/metrics
 timestamp 1541537913199 ms
 sendlinkstatus:000 3 lanes
 sendlinkstatus:001 0 lanes
 sendlinkstatus:002 0 lanes
 sendlinkstatus:003 0 lanes
 sendlinkstatus:004 0 lanes
 sendlinkstatus:005 0 lanes
 sendlinkstatus:006 0 lanes
 ...
 #
 nid00028:/sys/devices/virtual/gni/gpcdr0/metricsets # more linkrecvstatus/metrics
 timestamp 1541537920797 ms
 recvlinkstatus:000 3 lanes
 recvlinkstatus:001 0 lanes
 recvlinkstatus:002 0 lanes
 recvlinkstatus:003 0 lanes
 recvlinkstatus:004 0 lanes
 recvlinkstatus:005 0 lanes
 recvlinkstatus:006 0 lanes
 ...

Main

LDMSCON

Tutorials are available at the conference websites

D/SOS Documentation

LDMS v4 Documentation

Basic

Configurations

Features & Functionalities

Working Examples

Development

Reference Docs

Building

Cray Specific
RPMs
  • Coming soon!

Adding to the code base

Testing

Misc

Man Pages

  • Man pages currently not posted, but they are available in the source and build

LDMS Documentation (v3 branches)

V3 has been deprecated and will be removed soon

Basic

Reference Docs

Building

General
Cray Specific

Configuring

Running

  • Running

Tutorial

Clone this wiki locally