-
Notifications
You must be signed in to change notification settings - Fork 1
2020.08.04
The documentation needs to be improved
- SPANK plugin documentation
- PAPI plugin documentation including relationship with SYS-PAPI
- PAPI Store documentation
- SYSPAPI plugin documentation and relationship to with PAPI
- SYSPAPI store documentation
- How do we get all events coming from the Spank plugin arrive on a connection with the same user-id. Currently the job_start arrives on a connection from root.
- Eric: What about a feature to be able to specify a hardware counter event mask directly with PAPI/SYSPAPI?
All LDMSD Streams data is conveyed over an LDMS Transport. LDMS Transports are authenticated in one of three ways: "none", "ovis", and "munge".
- "none" : No authentication is done. This should only be used when all users are trusted.
- "ovis" : Secret word authentication. The remote peer is known to know the secret word; effectively every user is trusted as root.
- "munge" : The uid/gid of the remote peer is trusted by virtue of being verified through a third party
The trust level of data published through the stream service is known, however, the subscribing client does not have direct access to this information. We should consider adding API/Event data updates to convey this information so that subscribers can make decisions about how and to what extent to trust this data.
We should add a white-list/black-list feature to ldmsd to allow/disallow the publishing of data.
Add a "persistent-connection" API for streams.
There has been prolonged interest in running 'ldmsd' as a particular user. There are a number of ways to do this, including:
- Run a separate ldmsd daemon as user 'someone'
- Add a capability an ldmsd running as root to fork, seteuid/setguid(someone), continue
- RDMA transports hate to be fork/exec'd
- What configuration, if any, do you inherit from the parent
There are security issues around this:
- What uid/gid do you allow 'someone' to apply to the sets this ldmsd publishes
- Simply disallowing the setting of uid/gid with the API is not sufficient because the use could write a sampler that modified the set memory directly
- Mark S: understanding and advertising the data sizes and the computation (from the set data size). Especially for multiple feeds going from the aggregators off to the monitoring cluster.
- Phil R: ....and measuring it live. (Ben's dstat sampler and then v4-5 tracking if ldmsd's aren't keeping up)
- we may want to add network usage counters to the ldmsd zap transport and publish the result (each ldms message has a known size). Count message bytes and message events as atomic uint64_t. RDMA pulls need to get counted at the puller and sock maybe at both ends. dstat currently reports read/write io bytes.
- Chris M: holding data at the aggregator because of connecting to Kafka etc.
- Melissa A and Chris: getting rid of Kafka.
- What are the issues related to monitoring ldmsd from a web service?
- carried in the meta data, but we are ignoring it at the store
- what would have to change (if anything) at intermediate LDMS for apps to get access to some data for response (security limitations)
- changing the rwx model to enable more fined grain access control
- if you don't have permissions, you can't get the handle to get the set. You also cannot push changes to that set. You can change your own local copy, of course, but then you would have to have been authenticated to be in the ldms ecosystem.
- TODO: big security review and understanding. will take it to GitHub issues.
How do we go about reducing the number of schema in the system. The immutable nature of a schema coupled with most systems having many different kinds of nodes has resulted in an unreasonably large number of schema.
The problems are things like:
- different number of cores
- Multiple schema for metric sets that keep data per core even though they contain the "exact same" data
- different architectures (e.g. knl vs. haswell)
- The /proc/meminfo and /proc/vmstat have extra entries that result in extra schema
- simple configuration inconsistency
- schema name is set by the configuration which means that if the configuration is not consistent, identical metric sets will have different schema names
All of the above create real issues with analysis and visualization.
Set groups can be used to get around issues with system resources that come and go, e.g. disks, network interfaces. Set groups are mutable collections of other metric sets. One solution to the system resource problem could be to create separate sets for each resource and then 'group' them together into a single named entity so they can be fetched remotely as one entity.
How it works:
- A group is a configuration construct managed by ldmsd. It is not part of the LDMS protocol
- A metric set is created, however, this metric set contains only the names of it's members. When an aggregator updates this group, it gets the, potentially updated, list of metric sets
- A group is created at the source with a schema name that has a string in the name that identifies it as a group
- When the updater callback "see's" this name in the schema, it knows to lookup all the entries in the group at once, and similarly with update.
- Home
- Search
- Feature Overview
- LDMS Data Facilitates Analysis
- Contributing patches
- User Group Meeting Notes - BiWeekly!
- Publications
- News - now in Discussions
- Mailing Lists
- Help
Tutorials are available at the conference websites
- Coming soon!
- Testing Overview
- Test Plans & Documentation: ldms-test
- Man pages currently not posted, but they are available in the source and build
V3 has been deprecated and will be removed soon
- Configuring
- Configuration Considerations
- Running