Skip to content

Writing a sampler

oceandlr edited this page Nov 17, 2019 · 84 revisions

Table of Contents

Basics

The easiest way to write a sampler, is to look at an existing one, like meminfo.

    Meminfo overview:
    About: Samples from /proc/meminfo
    Run script:
       load name=meminfo                                                                                          
       config name=meminfo producer=localhost1 instance=localhost1/meminfo  schema=meminfo component_id=1        
       start name=meminfo interval=1000000
    
    Querying:    
    XXX/OVIS-4.3.1/sbin/ldms_ls -x sock -p 61101 -a ovis -A conf=/tmp/secret -l
    localhost1/meminfo: consistent, last update: Fri Oct 18 21:34:42 2019 -0600 [187636us] 
    M u64        component_id                               1
    D u64        job_id                                     0
    D u64        app_id                                     0
    D u64        MemTotal                                   32924784
    D u64        MemFree                                    8213892
    D u64        MemAvailable                               26760388
    D u64        Buffers                                    832084

Required Functions

Required functions for all samplers are defined in the ldmsd_sampler struct definition, near the bottom of the file. This defines the specific functions to be called at various points. The get_plugin function returns the struct for this plugin.

     static struct ldmsd_sampler meminfo_plugin = {
        .base = {
                .name = SAMP,
                .type = LDMSD_PLUGIN_SAMPLER,
                .term = term,
                .config = config,
                .usage = usage,
        },
        .get_set = get_set,
        .sample = sample,
     };
     struct ldmsd_plugin *get_plugin(ldmsd_msg_log_f pf)
     {
        msglog = pf;
        set = NULL;
        return &meminfo_plugin.base;
     }

About the functions:

  • usage -- output for the help, which defines the usage.
  • config -- called at configuration.
  • sample -- the actual sampling
  • get_set -- This function still exists in v4 but is no longer called.
  • term -- called when the sampler terminates.
Other:
  • msglog -- wrapper call for logging. Log levels are: LDMSD_LDEBUG, LDMSD_LINFO, LDMSD_LWARNING, LDMSD_LERROR, LDMSD_LCRITICAL, LDMSD_LALL.
  • SAMP -- #define for the sampler name, for convenience (e.g., "meminfo")
Some details of interest are described in more detail below.

get_plugin

get_plugin is called when the load call on the plugin is called.

schema uniqueness

schema names need to uniquely map to their data definition. Even array lengths must be the same. There is nothing that checks this nor that generates unique schema names for you. However, there is expectation that memory sizes, pointers, etc for a schema are consistent. A common convention is that for /proc related sources, the architecture information is obtained and encoded into the schema name.

There is metadata generation number which changes when the metadata for a schema changes. This should not be used to detect changes in a schema for name reuse.

The schema name uniqueness limitation will be obviated in v5.

Config

Config is invoked by the config command. Arguments from the config are passed in the attr_value_list *avl. They can be extracted by name, as shown below. av_value returns null if there is no attribute with that name:

     value = av_value(avl, "num_metrics");
     if (value)
                num_metrics = atoi(value);
     else
                num_metrics = -1;

Certain metrics are common to all samplers, for example instance_name, producer_name, component_id. In addition certain items can also be specified to samplers, although they are not metrics, like schema. Samplers call various base functions in order to create these metrics or use these values. Config should call the following function to config the base:

    base = base_config(avl, SAMP, SAMP, msglog);

You may want to prevent config from being called multiple times if your code is not prepared to handle the implications of the changes in config. For example, if your config creates a metricset, then subsequent calls to config on a running sampler, would have to handle the how these changes impact the current metricset, particularly while sampling is occurring. In many, but not all cases, a sampler has only a single metricset, and thus a call to check for the existence of the set at the beginning of config can guard against this scenario:

    if (set) {
                msglog(LDMSD_LERROR, SAMP ": Set already created.\n");
                return EINVAL;
             }

If you are using a configuration file, a non-zero return from config will abort processing the configuration file.

Creating A MetricSet

Generally, a metricset is created by creating the schema and then adding metrics to the schema. In many samplers with a well known data source (e.g., /proc/meminfo), then, the metricset is created by reading once from the data source to get the names of the metrics and adding them to the set.

The process proceeds as follows:

a) Create the schema from the base. This will also add the base metrics to the schema:

    schema = base_schema_new(base);
    metric_offset = ldms_schema_metric_count_get(schema);

The last call will tell you how many metrics were added from the base. In some cases this may be a useful number to retain, in case you later add values to the metricset based on their position in the schema (we will see this in the sample example below)

b) Open the data source, obtain the metric names, and add them to the schema, including their type:

    mf = fopen(procfile, "r");
    do {
            s = fgets(lbuf, sizeof(lbuf), mf);
            rc = sscanf(lbuf, "%s %" PRIu64,
                        metric_name, &metric_value);
    
            /* Strip the colon from metric name if present */
            i = strlen(metric_name);
            if (i && metric_name[i-1] == ':')
                    metric_name[i-1] = '\0';
    
            rc = ldms_schema_metric_add(schema, metric_name, LDMS_V_U64);
            if (rc < 0) {
                    rc = ENOMEM;
                    goto err;
            }
        } while (s);

If ldms_schema_metric_add fails, it will return -1. Otherwise, it will return the positional number of the added metric.

c) Once all the metrics have been added, get the set:

 set = base_set_new(base);

Sample

The sample function adds values to the metrics in the set. In addition, some values will been need to be obtained and set for the base, for example, the timestamp of the set.

a) We first check to make sure we have a set (in case sample is called on a non-configured plugin):

    if (!set) {
                msglog(LDMSD_LDEBUG, SAMP ": plugin not initialized\n");
                return EINVAL;
              }

b) Next, make the call for the any sample-related activities in the base (e.g., timestamp of the set):

    base_sample_begin(base);

c) In this case, we will add values by position, so use the index taking into account the base metrics, determined back when we created the metricset:

    metric_no = metric_offset;

d) Now process the data source to get the values and set them by position:

    fseek(mf, 0, SEEK_SET);
    do {
            s = fgets(lbuf, sizeof(lbuf), mf);
            rc = sscanf(lbuf, "%s %"PRIu64, metric_name, &v.v_u64);
                
            ldms_metric_set(set, metric_no, &v);
            metric_no++;
        } while (s);

e) Finally, end any transactional information in the base (e.g., the duration of the sample):

    base_sample_end(base);

A non-zero return code from sample will stop the plugin. If an error case may be temporary or may be a case for only some of the variables in the set, a common methodology is to return 0, even in these cases.

Usage

usage returns information on the configuration of a sampler. Be sure to include the macro providing information on the base usage, even if you have no other additional configuration arguments:

    return  "config name=" SAMP BASE_CONFIG_USAGE;

Usage can be seen at the command line by:

    > ldmsd -u meminfo
     LDMSD plugins in XXX/lib/ovis-ldms : 
    ======= SAMPLER meminfo:
     config name=meminfoproducer=<name> instance=<name> [component_id=<int>] [schema=<name>]
       [job_set=<name>] [job_id=<name>] [app_id=<name>] [job_start=<name>] [job_end=<name>]
       [uid=<user-id>] [gid=<group-id>] [perm=<mode_t permission bits>]
    producer     A unique name for the host providing the data
    instance     A unique name for the metric set
    component_id A unique number for the component being monitored, Defaults to zero.
    schema       The name of the metric set schema, Defaults to the sampler name
    job_set      The instance name of the set containing the job data, default is 'job_info'
    job_id       The name of the metric containing the Job Id, default is 'job_id'
    app_id       The name of the metric containing the Application Id, default is 'app_id'
    job_start    The name of the metric containing the Job start time, default is 'job_start'
    job_end      The name of the metric containing the Job end time, default is 'job_end'
    uid          The user-id of the set's owner (defaults to geteuid())
    gid          The group id of the set's owner (defaults to getegid())
    perm         The set's access permissions (defaults to 0777)
    
    =========================

Term

term is called when the plugin is terminated. In term be sure to close any file handles, delete the base, and delete the set, in addition to any other memory you may have allocated in your code

        if (base)
                base_del(base);
        if (set)
                ldms_set_delete(set);
        set = NULL;

Directory and Supporting files

Contributed samplers go under the contrib/sampler directory in their own directory, as described in Contributing. Include not only the sampler, but also the Makefile.am, the man page for the sampler, and any test.

NOTE: this directory structure has changed slightly since the LDMSCON2019 tutorial slides.

The following assumes a directory structure of:

    XXX/ldms/src/contrib/sampler/mysite

where all of the samplers for mysite will be in subdirectories under mysite. E.g:

    XXX/ldms/src/contrib/sampler/mysite/mysampler

A per-sampler directory structure is also in work for the main-line samplers.

Makefile.am

Add lines in the following:

XXX/ldms/configure.ac

    OPTION_DEFAULT_ENABLE([mysampler], [ENABLE_MYSITESAMPLER])
    AC_CONFIG_FILES([Makefile src/Makefile src/core/Makefile...
                src/contrib/sampler/mysite/Makefile
                src/contrib/sampler/mysite/mysampler/Makefile

XXX/ldms/src/contrib/sampler/Makefile.am

    if ENABLE_MYSITESAMPLER
        MAYBE_MYSITESAMPLER = mysite
    endif
    SUBDIRS += $(MAYBE_MYSITESAMPLER) 

XXX/ldms/src/contrib/sampler/mysite/Makefile.am

    if ENABLE_MYSITESAMPLER
        MAYBE_MYSITESAMPLER = mysampler
    endif
    SUBDIRS += $(MAYBE_MYSITESAMPLER) 

XXX/ldms/src/contrib/sampler/mysite/mysampler/Makefile.am

    pkglib_LTLIBRARIES =
    lib_LTLIBRARIES =
    check_PROGRAMS =
    dist_man7_MANS =
        
    CORE = ../../../core
    SAMPLER= ../../../sampler
    AM_CFLAGS = -I$(srcdir)/$(CORE) -I$(top_srcdir) -I../../.. @OVIS_LIB_INCDIR_FLAG@ \
                -I$(srcdir)/../../../ldmsd
    AM_LDFLAGS = @OVIS_LIB_LIB64DIR_FLAG@ @OVIS_LIB_LIBDIR_FLAG@
    
    BASE_LIBADD = ../../libsampler_base.la
    COMMON_LIBADD = $(CORE)/libldms.la libfoo.la \
            @LDFLAGS_GETTIME@ -lovis_util -lcoll
    
    if ENABLE_MYSAMPLER
        lib_LTLIBRARIES += libfoo.la
        libfoo_la_SOURCES = foo.c foo.h
      
        libmysampler_la_SOURCES = mysampler.c
        libmysampler_la_LIBADD = $(BASE_LIBADD) $(COMMON_LIBADD)
        libmysampler_la_CFLAGS = $(AM_CFLAGS) -I$(srcdir)/$(SAMPLER)
        pkglib_LTLIBRARIES += libmysampler.la
        dist_man7_MANS += Plugin_mysampler.man
        
        check_PROGRAMS += mysampler_test
        mysampler_test_SOURCES = mysampler.c mysampler.h
        mysampler_test_CFLAGS = -DMAIN
     endif
     EXTRA_DIST = Plugin_mysampler.man

man page

Naming convention for plugins: Plugin_mysampler.man

Canonical headers and formatting for Linux man pages are used.

Advanced Topics

Multiple metric sets

test_sampler is a sampler that supports creation of multiple sets, with either the same or different schema, in the same sampler. This sampler keeps a list of schema and a list of sets and the set list is iterated through in the sample function.

The sets have different instance names and separate metadata. From the point of view of ldms_ls or the store these are no different than if the sets came from different samplers.

Its simplest form is triggered by a parameter action=default which creates one or more sets of the same schema with some default variables.

Configuring:

    config name=test_sampler producer=localhost1 action=default schema=test1 num_sets=2 component_id=1
    start name=test_sampler interval=1000000  

Querying:

    XXXX> ldms_ls -x sock -p 61101 -a ovis -A conf=/tmp/secret -l
    set_0: consistent, last update: Thu Oct 10 09:28:07 2019 -0600 [345791us] 
    D u64        metric_0                                   0
    D u64        metric_1                                   1
    D u64        metric_2                                   2
    D u64        metric_3                                   3
    D u64        metric_4                                   4
    D u64        metric_5                                   5
    D u64        metric_6                                   6
    D u64        metric_7                                   7
    D u64        metric_8                                   8
    D u64        metric_9                                   9
               
    set_1: consistent, last update: Thu Oct 10 09:28:07 2019 -0600 [345790us] 
    D u64        metric_0                                   0
    D u64        metric_1                                   1
    D u64        metric_2                                   2
    D u64        metric_3                                   3
    D u64        metric_4                                   4
    D u64        metric_5                                   5
    D u64        metric_6                                   6
    D u64        metric_7                                   7
    D u64        metric_8                                   8
    D u64        metric_9                                   9
    

Querying:

    XXXX> ldms_ls -x sock -p 61101 -a ovis -A conf=/tmp/secret -v
    Schema         Instance                 Flags  Msize  Dsize  UID    GID    Perm       Update            Duration          Info    
    -------------- ------------------------ ------ ------ ------ ------ ------ ---------- ----------------- ----------------- --------
    test1          set_1                       CL     488    136  22398  22398 -rwxrwxrwx 1570721297.355649              0.000002 "updt_hint_us"="1000000:" 
    test1          set_0                       CL     488    136  22398  22398 -rwxrwxrwx 1570721297.355650          0.000000 "updt_hint_us"="1000000:" 
    -------------- ------------------------ ------ ------ ------ ------ ------ ---------- ----------------- ----------------- --------
    Total Sets: 2, Meta Data (kB): 0.98, Data (kB) 0.27, Memory (kB): 1.25

In a more complex form, you can define the schema for each set to be different:

Configuring:

    config name=test_sampler action=add_schema schema=test1  num_metrics=10                                                             
    config name=test_sampler action=add_set schema=test1 instance=localhost9/test_sampler component_id=9 push=1 producer=localhost9 jobid=666                                                                                                                               
    config name=test_sampler action=add_schema schema=test2 num_metrics=9                                                                    
    config name=test_sampler action=add_set schema=test2 instance=localhost10/test_sampler component_id=10 push=1 producer=localhost10 jobid=666   
    start name=test_sampler interval=1000000   

Querying:

    XXXX> ldms_ls -x sock -p 61101 -a ovis -A conf=/tmp/secret -l
    localhost10/test_sampler: consistent, last update: Thu Oct 10 10:00:46 2019 -0600 [790147us] 
    M u64        component_id                               10
    D u64        job_id                                     666
    D u64        metric_0                                   5
    D u64        metric_1                                   5
    D u64        metric_2                                   5
    D u64        metric_3                                   5
    D u64        metric_4                                   5
    D u64        metric_5                                   5
    D u64        metric_6                                   5
    D u64        metric_7                                   5
    D u64        metric_8                                   5
      
    localhost9/test_sampler: consistent, last update: Thu Oct 10 10:00:46 2019 -0600 [790149us] 
    M u64        component_id                               9
    D u64        job_id                                     666
    D u64        metric_0                                   5
    D u64        metric_1                                   5
    D u64        metric_2                                   5
    D u64        metric_3                                   5
    D u64        metric_4                                   5
    D u64        metric_5                                   5
    D u64        metric_6                                   5
    D u64        metric_7                                   5
    D u64        metric_8                                   5
    D u64        metric_9                                   5

Querying:

    XXXX> ldms_ls -x sock -p 61101 -a ovis -A conf=/tmp/secret -v
    Schema         Instance                 Flags  Msize  Dsize  UID    GID    Perm       Update            Duration          Info    
    -------------- ------------------------ ------ ------ ------ ------ ------ ---------- ----------------- ----------------- --------
    test1          localhost9/test_sampler     CL     592    144  22398  22398 -rwxrwxrwx 1570723250.794460          0.000001 "updt_hint_us"="1000000:" 
    test2          localhost10/test_sampler    CL     560    136  22398  22398 -rwxrwxrwx 1570723250.794458          0.000003 "updt_hint_us"="1000000:" 
    -------------- ------------------------ ------ ------ ------ ------ ------ ---------- ----------------- ----------------- --------
    Total Sets: 2, Meta Data (kB): 1.15, Data (kB) 0.28, Memory (kB): 1.43
                                                                                                                      

This is a good example of a case where multiple calls to config need to be supported. A convention for these situations is that the keyword 'action' be used to distinguish how each of the config lines should be handled:

    static int config(struct ldmsd_plugin *self, struct attr_value_list *kwl, struct attr_value_list *avl)
    {
        ...
    
        action = av_value(avl, "action");
        if (action) {
                rc = 0;
                if (0 == strcmp(action, "add_schema")) {
                        rc = config_add_schema(avl);
                } else if (0 == strcmp(action, "add_set")) {
                        rc = config_add_set(avl);
                } else if (0 == strcmp(action, "default")) {
                        rc = config_add_default(avl);
                } else {
                        msglog(LDMSD_LERROR, "test_sampler: Unrecognized "
                                "action '%s'.\n", action);
                        rc = EINVAL;
                }
                return rc;
        }

Multiples of the same sampler

Multiples of the same sampler will not be supported until v5.

Changing the metrics in a sampler's metric set

If you want to change the metrics in sampler's metric set, you should create a new set with a different schema name. See Writing-a-Store for the implications on the store for creating sets with new schema.

Starting/Stopping/Changing Rates of a sampler

You have to stop and start (with new interval) a sampler to change the sampling rate.

Robustness and Error Handling

a) config -- Multiple calls to config are possible, and, possibly, even desirable. In these cases, different actions in configuration are typically distinguished by including an action parameter in the config line. Examples include the case above where multiple schema and sets are supported by the sampler; another case is in samplers with performance counters which may have one config line that specifies one or more performance counters to be set and another config line to finalize the set once all the performance counters have been set.

    static int config(struct ldmsd_plugin *self, struct attr_value_list *kwl, struct attr_value_list *avl)
    {
            char *action;
        
            action = av_value(avl, "action");
            if (action) {
                    rc = 0;
                    if (0 == strcmp(action, "add_schema")) {
                            rc = config_add_schema(avl);
                    } else if (0 == strcmp(action, "add_set")) {
                            rc = config_add_set(avl);
                    } else if (0 == strcmp(action, "default")) {
                            rc = config_add_default(avl);
                    } else {
                            msglog(LDMSD_LERROR, "test_sampler: Unrecognized "
                                    "action '%s'.\n", action);
                            rc = EINVAL;
                    }
                    return rc;
            }

In many cases, once the set has been established, you no longer want to change the configuration. A common methodology in these cases is to check for the existence of the set in the config, and return an error if the set exists. If you are using a configuration file, a non-zero return from config will abort processing the configuration file.

b) sample -- sample is called repeatedly. A non-zero return code from sample will stop the sampler. If an error case may be temporary or may be a case for only some of the variables in the set, a common methodology is to return 0, even in these cases.

Job Data

under construction

Main

LDMSCON

Tutorials are available at the conference websites

D/SOS Documentation

LDMS v4 Documentation

Basic

Configurations

Features & Functionalities

Working Examples

Development

Reference Docs

Building

Cray Specific
RPMs
  • Coming soon!

Adding to the code base

Testing

Misc

Man Pages

  • Man pages currently not posted, but they are available in the source and build

LDMS Documentation (v3 branches)

V3 has been deprecated and will be removed soon

Basic

Reference Docs

Building

General
Cray Specific

Configuring

Running

  • Running

Tutorial

Clone this wiki locally