-
Notifications
You must be signed in to change notification settings - Fork 1
2019.04.15
ovis-hpc edited this page Apr 15, 2019
·
17 revisions
- Project changes between OVIS v3 and v4
- OVIS v3 exposed LDMS, Baler, and SOS as a single project
- Lightweight Distributed Metric Service (LDMS) – data collection, transport, and storage
- Baler – log file pattern tagging and analysis
- Scalable Object Store (SOS) – object store targeting HPC ingest, read, and analyses needs
- OVIS v4 exposes LDMS, Baler, and SOS as independent projects
- Baler is dependent on SOS
- LDMS store-sos plugin depends on SOS
- SOS is independent of LDMS and Baler
- OVIS v3 exposed LDMS, Baler, and SOS as a single project
- Submodule confusion
- The submodules in OVIS are no longer maintained
- How to build with SOS without using submodules
- Check out the desired version of SOS
- Configuration
- Prerequisites for Python support
- Python 2.7+
- Numpy
- Cython 0.29+
- You can use pip to install of these
- If you don’t want/can’t support these dependencies, use –disable-python on the configure line
- cd sos-latest-stable
- mkdir build
- cd build
- ./autogen.sh
- cd build
- ../configure –prefix __install-dir__ \
- --libdir __install-dir__/lib64 \
- -–libexecdir __install-dir__/lib64
- [--disable-python]
- Prerequisites for Python support
- Building SOS
- make && sudo make install
- How to build zap_ugni without gpcd submodule (XC/Aries only; XE/Gemini uses system gpcdr)
- Configuration that points to pre-built gpcd libs and headers
- --enable-ugni
- --with--aries-libgpcd=/opt/cray/gni/default/lib64/,/opt/cray/gni/default/include/gpcd/
- gpcd libs and headers location
- Cray CLE 6.? UP?:
- /opt/cray/gni/default/include/gpcd/
- /opt/cray/gni/default/lib64/
- Cray CLE 6.? UP?:
- Getting the gpcd code
- https://github.com/ovis-hpc/gpcd
- git clone [email protected]:ovis-hpc/gpcd.git gpcd
- Building gpcd
- Set up your environment to use the gnu compiler
- cd gpcd
- ./autogen.sh
- mkdir build
- cd build
- ../configure --prefix=/gpcd
- Make && make install
- This will install libs in gpcd/lib/
- This will install headers in gpcd/include/gpcdlocal/
- NOTE: The local build names libs with the label “local” so you must set up symbolic links for actual lib names:
- cd build_dir/lib
- ln -s libgpcdlocal.so.0.0.0 libgpcd.so
- ln -s libgpcdlocal.so.0.0.0 libgpcd.so.0
- Configuration that points to pre-built gpcd libs and headers
- LDMSD Transport
- Determining what value to set for the Completion Queue (CQ) depth for aggregators:
- Recommendation: Set ZAP_UGNI_CQ_DEPTH=65536
- The default CQ depth 2K. This is not enough for aggregators.
- The CQ contains slots that are consumed when RDMA requests are completed.
- RDMA is requested by
- lookup
- Happens right after connect
- update
- Happens every time the updater schedules a set update
- push
- Happens when a set registered for push closes a transaction boundary (i.e. sample completes)
- lookup
- If the low level completion thread cannot keep up, this may cause the CQ to overflow and result in GNI_RESOURCE_ERRORs
- Recommendation: Set ZAP_UGNI_CQ_DEPTH=65536
- Event queue depth
- Recommendation: export ZAP_EVENT_QDEPTH=65536
- I/O events are delivered to threads for handling by the application
- Recommendation: export ZAP_EVENT_WORKERS=8
- To maintain ordering an endpoint is assigned to one and only one of the I/O worker threads
- Because the handling of an event can take a long time (e.g. storing the updated data), these queues may need to be deeper than expected and the number of threads larger
- If the I/O still cannot keep up due to your system limitations, you may need to split up your store (e.g., use multiple containers, or multiple aggregators writing to different locations), so that the data does not go to a single sink.
- Determining what value to set for the Completion Queue (CQ) depth for aggregators:
- LDMSD Set memory
- Sets occupy memory that is mapped and locked for the purpose of exchanging with a peer using RDMA
- Typically, these transports have limited remote memory access resources. To minimize LDMS utilization of these, we map all set memory a-priori and use it as needed to contain instances of sets
- This is the -m option on the command line. Sets may require anywhere from 2K to 64K depending on the set
- Default values are currently set based on sampler estimates, and will be sufficient even for the aggregator for test cases. For large-scale systems and many sets, you should increase the aggregator set memory size.
- Home
- Search
- Feature Overview
- LDMS Data Facilitates Analysis
- Contributing patches
- User Group Meeting Notes - BiWeekly!
- Publications
- News - now in Discussions
- Mailing Lists
- Help
Tutorials are available at the conference websites
- Coming soon!
- Testing Overview
- Test Plans & Documentation: ldms-test
- Man pages currently not posted, but they are available in the source and build
V3 has been deprecated and will be removed soon
- Configuring
- Configuration Considerations
- Running