-
Notifications
You must be signed in to change notification settings - Fork 1
Building for Cray with CLE5
valleydlr edited this page Apr 26, 2017
·
7 revisions
module unload PrgEnv-pgi module unload PrgEnv-cray module unload PrgEnv-intel # module unload PrgEnv-XXX (if other XXX is not gnu) module load PrgEnv-gnu
rpm -q openssl-devel gcc libevent python-base python-devel gettext-tools libevent-devel
- Additional packages are needed if you want to enable extra features (not covered in this example)
rpm -q libyaml-0-2 libyaml-devel swig
If you do not have access to query the rpm database on your platform, you can check for the presence of some required files:
/usr/share/gettext/config.rpath /usr/include/openssl/md5.h /usr/include/python2.6/Python.h
and some files required for the extra features:
/usr/include/yaml.h /usr/lib64/libyaml.so /usr/share/swig/2.0.12/python/cstring.i
- Clone ovis from https://github.com/ovis-hpc/ovis (NOTE: Do not download the zip file)
mkdir ~/Source && cd ~/Source && git clone https://github.com/ovis-hpc/ovis.git ovis
- In this example the ovis clone is assumed to be named ~/Source/ovis and builds in ~/Build.
- The release branch to be compiled is OVIS-3.3.0, not master.
- You may need to set an https_proxy environment variable before the git clone will work if you are behind a firewall.
- Download libevent-2.0.22-stable from libevent.org. You may need to download to another host and then transfer the archive file to the CLE5 environment; wget and curl clients in some CLE5 environments are not current with more recent web server SSL standards.
- In this example source files are assumed to be in ~/Source and installations in ~/Build
cp <path>/libevent-2.0.22-stable.tar.gz ~/Source cd ~/Source tar xzf libevent-2.0.22-stable.tar.gz cd libevent-2.0.22-stable ./autogen.sh
- Create a configure.sh script with the following content:
#!/bin/bash ../configure --prefix=$HOME/Build/libevent-2.0_build && make -j 16 && make install
- make configure.sh executable
chmod +x configure.sh
- Create a build directory
mkdir build
- cd to build directory and run configure.sh
cd build ../configure.sh
The result should leave libevent.so in $HOME/Build/libevent-2.0_build/lib
- Get the sources and generate the build system like so:
cd ~/Source/ovis git checkout OVIS-3.X.Y git submodule init gpcd-support git submodule update gpcd-support git submodule init sos git submodule update sos ./autogen.sh
- Create build script with the following: -- named in this example configure.sh
#!/bin/bash # # SYNOPSIS: Remove existing build directories, do the automake routine, rebuild, # and install everything. # # REMARK: This script doesn't do uninstall. If you wish to uninstall (e.g. make # uninstall), please go into each build directories ( */build-$HOSTNAME ) and # call make uninstall there, or just simply do the following # for D in */build-$HOSTNAME; do pushd $D; make uninstall; popd; done; # # BUILD_PATH=$HOME/Build PREFIX=$BUILD_PATH/OVIS-3.X cd $HOME/Source/ovis/build mkdir -p $PREFIX # # add --enable-FEATURE here ENABLE="--enable-ugni \ --enable-ldms-python \ --enable-kgnilnd \ --enable-lustre \ --enable-tsampler \ --enable-cray_power_sampler \ --enable-cray_system_sampler \ --enable-aries-gpcdr \ --enable-aries_mmr" # # add --disable-FEATURE here DISABLE="--disable-rpath \ --disable-readline \ --disable-baler \ --disable-sos \ --disable-mmap " # # libevent2 prefix LIBEVENT_PREFIX=$BUILD_PATH/libevent-2.0_build # # WITH_CRAY="--with-rca=/opt/cray/rca/default --with-krca=/opt/cray/krca/default --with-cray-hss-devel=/opt/cray-hss-devel/default --enable-gpcdlocal" # WITH_CRAY="$WITH_CRAY --with-libevent=$LIBEVENT_PREFIX" # # CFLAGS='-g -O3 -Wl,-z,defs' # # Exit immediately if a command failed set -e set -x # $HOME/Source/ovis/configure --prefix=$PREFIX --with-pkglibdir=ovis-lib $ENABLE $DISABLE $WITH_CRAY CFLAGS="$CFLAGS" LDFLAGS=$LDFLAGS CPPFLAGS=$CPPFLAGS
- make configure.sh executable
chmod +x configure.sh
- Create a build directory
mkdir build
- cd to build directory and run the build in a script shell to capture the output in install.log like so:
cd build script -c '../configure.sh && make -j 16 && make install && echo success' install.log
- Review the output in install.log if the final output on the screen does not end with 'success'
- A simple test can be run immediately (it does not use Cray-specific plugins):
~/Build/OVIS-3.3/bin/ldms_local_usertest.sh
This exercises the generic socket transport, sampler, aggregator, and store plugins. The screen output should end with a message like:
logs and data stored under /tmp/username/ldmstest/142819 done
- README files and sample scripts for the XC40 running Rhine Redwood reside in the ~/Source/ovis/util/sample_init_scripts/XC40_RR directory.
- Note1: The scripts must be modified, as described in the README file, to fit a particular deployment configuration.
- Note2: The "export LD_LIBRARY_PATH=" line in the ldmsd.conf file has an error. It should look the same as the one in the ldms_env file. In this case:
export LD_LIBRARY_PATH=$TOP/OVIS-3.3/lib/:$TOP/OVIS-3.3/lib/ovis-ldms:$TOP/OVIS-3.3/lib/ovis-lib:$TOP/libevent-2.0_build/lib:$LD_LIBRARY_PATH
- Note3: The "export LDMSD_PLUGIN_PATH=" line in the ldmsd.conf and ldms_env file has an error. It should look the same in both. In this case:
export LDMSD_PLUGIN_LIBPATH=$TOP/OVIS-3.3/lib/ovis-lib/
- Write a script to set up environment (in this example the script will be named ldms_env). Note: If you want to use the ugni transport for RDMA data transfers you will need to configure a protection domain and obtain the corresponding cookie value for assignment to the ZAP_UGNI_COOKIE environment variable.
#!/bin/bash # TOP=$HOME/Build export LD_LIBRARY_PATH=$TOP/OVIS-3.3/lib/:$TOP/OVIS-3.3/lib/ovis-ldms:$TOP/OVIS-3.3/lib/ovis-lib:$TOP/libevent-2.0_build/lib:$LD_LIBRARY_PATH export LDMSD_PLUGIN_LIBPATH=$TOP/OVIS-3.3/lib/ovis-lib/ export ZAP_LIBPATH=$TOP/OVIS-3.3/lib/ovis-lib/ export PATH=$TOP/OVIS-3.3/sbin:$PATH # # Use this if using shared secret authentication export LDMS_AUTH_FILE=<absolute path to ovis/etc>/shared_secret # # Use the following if running on a Cray XC ############################ # Will need to configure a protection domain cookie first export ZAP_UGNI_PTAG=0 export ZAP_UGNI_COOKIE=<hex value of cookie e.g., 0x86bb0000> # # Set interval for peroidically checking node state. Note that for ldms_ls this should still be defined export ZAP_UGNI_STATE_INTERVAL=1000000 # Set offset relative to 0 seconds. Typically set to something negative so the refresh is just before an aggregation export ZAP_UGNI_STATE_OFFSET=-10000 ########################
- Write a configuration file to be used to start a LDMS daemon (ldmsd) with the "meminfo" sampler plugin (in this example this file will be called meminfo_configuration)
load name=meminfo config name=meminfo producer=nid00012 component_id=12 instance=nid00012/meminfo start name=meminfo interval=1000000 offset=0
- Start a ldmsd as a sampler using meminfo_configuration. Note: This set of examples shows use of the "sock" transport. If using "ugni" you will replace all "sock" arguments to the -x flag with "ugni" in these examples. Also "xprt=sock" will need to be replaced with "xprt=ugni" in the agg_configuration examples below.
source ldms_env ldmsd -x sock:60411 -S /tmp/ldmsd_sock -v CRITICAL -l /tmp/ldmsd_log -r /tmp/ldmsd.pid -c ./meminfo_configuration
- make sure ldmsd is running
# ps auxw | grep ldmsd root 40662 0.0 0.0 383032 2120 ? Ssl 11:08 0:00 ldmsd -x sock:60411 -S /tmp/ldmsd_sock -v CRITICAL -l /tmp/ldmsd_log -r /tmp/ldmsd.pid -c ./meminfo_configuration
- Now use "ldms_ls" utility to check metric sets
- List sets being hosted by this ldmsd
$ ldms_ls -h localhost -x sock -p 60411 nid00012/meminfo
- More verbose listing
$ ldms_ls -h localhost -x sock -p 60411 -v nid00012/meminfo: consistent, last update: Fri Nov 25 11:16:49 2016 [1401us] METADATA -------- Producer Name : nid00012 Instance Name : nid00012/meminfo Schema Name : meminfo Size : 1856 Metric Count : 44 GN : 2 DATA ------------ Timestamp : Fri Nov 25 11:16:49 2016 [1401us] Duration : [0.000048s] Consistent : TRUE Size : 392 GN : 9677 -----------------
- Long listing that includes metric names, data types, and current values (Note: "M" designates a value as meta-data and "D" designates data)
$ ldms_ls -h localhost -x sock -p 60411 -l nid00012/meminfo: consistent, last update: Fri Nov 25 11:18:18 2016 [1599us] M u64 component_id 12 D u64 job_id 0 D u64 MemTotal 132163924 D u64 MemFree 129978224 D u64 Buffers 0 D u64 Cached 158704 D u64 SwapCached 0 D u64 Active 75064 D u64 Inactive 142104 D u64 Active(anon) 67168 D u64 Inactive(anon) 123720 D u64 Active(file) 7896 D u64 Inactive(file) 18384 D u64 Unevictable 4080 D u64 Mlocked 749688 D u64 SwapTotal 0 D u64 SwapFree 0 D u64 Dirty 0 D u64 Writeback 0 D u64 AnonPages 62584 D u64 Mapped 13348 D u64 Shmem 131092 D u64 Slab 209900 D u64 SReclaimable 14232 D u64 SUnreclaim 195668 D u64 KernelStack 5256 D u64 PageTables 1960 D u64 NFS_Unstable 0 D u64 Bounce 0 D u64 WritebackTmp 0 D u64 CommitLimit 66081960 D u64 Committed_AS 289428 D u64 VmallocTotal 34359738367 D u64 VmallocUsed 2245696 D u64 VmallocChunk 34290198256 D u64 HardwareCorrupted 0 D u64 HugePages_Total 0 D u64 HugePages_Free 0 D u64 HugePages_Rsvd 0 D u64 HugePages_Surp 0 D u64 Hugepagesize 2048 D u64 DirectMap4k 7156 D u64 DirectMap2M 1955840 D u64 DirectMap1G 134217728
- Write a configuration file for an aggregator (in this example called agg_configuration)
prdcr_add name=grp1.nid00012.60411 host=nid00012 port=60411 xprt=sock type=active interval=30000000 prdcr_start name=grp1.nid00012.60411 updtr_add name=grp1 interval=1000000 offset=100000 updtr_prdcr_add name=grp1 regex=grp1..* updtr_start name=grp1
- Start a ldmsd as an aggregator using agg_configuration (Note: make sure to use a different port if running on the same host as your sampler)
$ ldmsd -x sock:60412 -S /tmp/ldmsd_sock_agg -m 2GB -P 16 -v CRITICAL -l /tmp/ldmsd_log_agg -r /tmp/ldmsd_agg.pid -c ./agg_configuration
- Note1: If aggregating from a substantial number of hosts you will want to specify how much memory should be allocated to this daemon using the "-m" flag and how many threads using the "-P" flag. Additionally you will want to increase the number of threads available to the transport and the queue depth using the ZAP_EVENT_WORKERS and ZAP_EVENT_QDEPTH environment variables. These can be added to the ldms_env script above like:
export ZAP_EVENT_WORKERS=16 (sets the number of ZAP (transport) worker threads to 16) export ZAP_EVENT_QDEPTH=65536 (sets the queue depth buffer to 65536 entries)
- Note2: You may also need to increase the number of file descriptors that can be concurrently open depending on the number of hosts being aggregated from. This can be added to the ldms_env script like:
ulimit -n 100000 (sets the number to 100 thousand)
- Check to make sure the daemon is running and contains the expected sets as above for the sampler daemon
- Add storage configuration by editing your agg_configuration file to look like this:
prdcr_add name=grp1.nid00012.60411 host=nid00012 port=60411 xprt=sock type=active interval=30000000 prdcr_start name=grp1.nid00012.60411 updtr_add name=grp1 interval=1000000 offset=100000 updtr_prdcr_add name=grp1 regex=grp1..* updtr_start name=grp1 load name=store_csv config name=store_csv path=/tmp/LDMS_CSV action=init altheader=1 buffer=0 strgp_add name=meminfo-csv_store plugin=store_csv container=csv schema=meminfo strgp_start name=meminfo-csv_store
- kill aggregator ldmsd (if currently running)
kill <pid of aggregator ldmsd>
- Re-run using updated configuration file
$ ldmsd -x sock:60412 -S /tmp/ldmsd_sock_agg -m 2GB -P 16 -v CRITICAL -l /tmp/ldmsd_log_agg -r /tmp/ldmsd_agg.pid -c ./agg_configuration
- You should now see the directory /tmp/LDMS-CSV/csv with files "meminfo" and "meminfo.HEADER" in it
- For more aggregator and store configuration options please refer to the man pages in:
<build dir>/share/man/man8/ldmsd.8 <build dir>/share/man/man7/Plugin_store_csv.7
- Home
- Search
- Feature Overview
- LDMS Data Facilitates Analysis
- Contributing patches
- User Group Meeting Notes - BiWeekly!
- Publications
- News - now in Discussions
- Mailing Lists
- Help
Tutorials are available at the conference websites
- Coming soon!
- Testing Overview
- Test Plans & Documentation: ldms-test
- Man pages currently not posted, but they are available in the source and build
V3 has been deprecated and will be removed soon
- Configuring
- Configuration Considerations
- Running