-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathINSTALL
163 lines (124 loc) · 8.64 KB
/
INSTALL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
Requirements
============
1. DataSpaces requires MPI.
2. DataSpaces typically requires a filesystem that supports flock() system calls. There are work-arounds for this for some transports.
Quick Installation Instructions
===============================
The following steps are needed to configure, build and install DataSpaces.
1. Configure
# Examples for the configure command on supported architectures and networks. Please customize it according to your specific system configurations and environments (If there is no 'configure' file, please run ./autogen.sh to generate it.).
# Cray XE, XK, XC series
$ ./configure CC=cc FC=ftn
## On Cray platforms, DataSpaces requires the use of Cray protection domains for uGNI network communication.
Protection domains are a Cray uGNI security feature that allow different applications to perform RDMA transfers
to one another on the uGNI network and consist of a ptag and cookie (or, for Aries, 2 cookie) pair. Please refer
to the documentation below for DataSpaces configuration options based on the Cray uGNI network of your system.
For more general information on protection domains, please see the Cray uGNI Documentation.
If you are creating a user-level protection domain using apmgr -c, please note that you must also modify
your run command to pass the credential as `aprun -p MY_PDOMAIN_NAME`.
## Cray uGNI Gemini Systems
If your system allows users to create user-level protection domains, `apmgr pdomain -c [YOUR-DOMAIN-NAME]`
will create it and generate a unique ptag/cookie pair. You can then obtain the ptag/cookie information by
running `apstat -P`. If your system does not allow user-level pdomains, you may see if any system-level domains
are available (as defined by the field *Type* in the output of the `apstat -P` command). If neither options are
available, please contact your system administrator for help creating and configuring protection domains.
Once you have a protection domain, you may use ONE of the following options:
1. If you do not expect your ptag or cookie to change, you may compile it directly into your coded
directly by using the configure options:
$ ./configure CC=cc FC=ftn --with-gni-ptag=[GNI-PTAG-VALUE] --with-gni-cookie=[GNI-HEXA-COOKIE]
2. If you wish to provide the name of your domain at runtime, you can add `export DSPACES_GNI_PDOMAIN=[YOUR-DOMAIN-NAME]`
to your job script. This method only works if the compute nodes are able to execute `apstat -P`.
3. If you wish to provide the ptag and cookie directly at runtime (i.e., if the compute nodes do not have the ability to execute `apstat -P`),
you can add `export DSPACES_GNI_PTAG=[GNI-PTAG-VALUE]` and `export DSPACES_GNI_COOKIE=[GNI-HEXA-COOKIE]` to your job script.
NOTE that on Gemini systems, you must provide BOTH the ptag AND cookie if using options 1 or 3.
## Cray uGNI Aries Systems
On Aries systems, there are two possible network configurations as set by machine administrators:
(1) uGNI network has the Dynamic RDMA Credential (DRC) service enabled and (2) DRC service unavailable or not configured.
For specific instructions for each case, see below.
Note that regardless of the network configuration, the ptag is generated on-the-fly by the uGNI network on ALL Aries systems and should not be provided a priori.
### Where Dynamic RDMA Credentials (DRC) is Enabled
DataSpaces is integrated wth DRC and manages the creation and exchange of credentials amongst all applications in a job.
Ensure that the rdma-credentials binaries and libraries are in your path when compiling.
$ ./configure CC=cc FC=ftn --enable-drc
### Where No DRC is Available
Follow the instructions outlined in the Gemini section but note that you will not be providing the ptag in option 1 or 3.
$ ./configure CC=cc FC=ftn --with-gni-cookie=[GNI-HEXA-COOKIE]
# Infiniband cluster
$ ./configure CC=mpicc FC=mpif90
* For TACC Stampede InfiniBand Cluster
$ module unload intel
$ module unload mvapich2/2.1
$ module load intel/13.0.2.146
$ module load mvapich2/1.9a2
$ ./configure CC=mpicc FC=mpif90 CFLAGS="-DHAVE_INFINIBAND -L/opt/ofed/lib64/ -lrdmacm -I/opt/ofed/include/" --with-ib-interface=ib
# TCP/IP
$ ./configure CC=[C_COMPILER_OPTION] FC=[Fortran_COMPILER_OPTION] --enable-dart-tcp
* [C_COMPILER_OPTION]: C compiler in cluster; [Fortran_COMPILER_OPTION]: Fortran compiler in cluster.
2. Some of the useful configure options
--prefix=/install/path
Set the directory where DataSpaces will be installed.
--with-max-num-array-dimension=integer value
This option is used to configure the maximum number
of array dimension that can be supported in
DataSpaces/DIMES. Default value is set as 3. Note:
the value can not be set larger than 10, or smaller than 3.
--enable-dimes
Build DataSpaces with DIstributed MEmory Space (DIMES) support.
DIMES feature uses local RDMA memory buffers across application processes to build
a virtual shared data space in-situ, which can be queried and directly accessed by
other applications in a data coupling workflow. Currently this feature is only
supported on Cray Gemini networks, Infiniband cluster and on IBM BG/P and BG/Q systems.
--with-dimes-rdma-buffer-size=integer value for megabytes
This option specifies the maximum amount of RDMA memory buffer that
can be used by DIMES in each application process. DIMES RDMA memory buffer is used
to locally cache the data written by dimes_put(), and is also used by
dimes_get() to fetch data in remote memory buffer. Must be used with
--enable-dimes option. Default value is set as 64MB.
--with-infiniband-msg-queue-size=integer value
User can use this option to change the size of the messaging queue on DataSpaces servers.
If you receive errors stating the messaging queue is full. You should use this option to
increase the size of the messaging queue.
For applications using DIMES, dimes_put_sync_all() or dimes_put_sync_group()
API functions (see dimes_interface.h for detailed documentation) should be
called periodically to free data objects cached in DIMES. Where there is not
sufficient memory space, dimes_put()/dimes_get() fails and return errors.
--enable-drc
Build DataSpaces for Cray uGNI systems with Dynamic RDMA Credential (DRC) support. Note that using this method, users should not provide a ptag or cookie a priori.
--with-dimes-rdma-buffer-size=integer value for megabytes
This option specifies the maximum amount of RDMA memory buffer that
can be used by DIMES in each application process. DIMES RDMA memory buffer is used
to locally cache the data written by dimes_put(), and is also used by
dimes_get() to fetch data in remote memory buffer. Must be used with
--enable-dimes option. Default value is set as 64MB.
--enable-dart-tcp
This option will enable DataSpaces to build on ethernet using TCP/IP socket interfaces.
Since socket is available in most of the current clusters, this option will avoid the
conflict of selections between IP network and other advanced networks (e.g. IB, Gemini).
--with-gni-ptag = PTAG DECIMAL VALUE
--with-gni-cookie = COOKIE HEXA VALUE
This two options directly give the values of ptag+cookie pair to DataSpaces in uGNI
based systems. When DataSpaces is running at Aries-based systems, ptag will be
automatically assigned by the system itself. Therefore, in those systems, only
--with-gni-cookie is required to provided and activated.
--with-infiniband-msg-queue-size=integer value
User can use this option to change the size of the messaging queue on DataSpaces servers.
If you receive errors stating the messaging queue is full. You should use this option to
increase the size of the messaging queue.
For applications using DIMES, dimes_put_sync_all() or dimes_put_sync_group()
API functions (see dimes_interface.h for detailed documentation) should be
called periodically to free data objects cached in DIMES. Where there is no
sufficient memory space, dimes_put()/dimes_get() would fail and return errors.
--enable-shmem
Use this option to enable Hybrid Staging. Read operations will use shared memory transport
instead of RDMA when the target of the read is colocated on the same node.
--enable-sync-msg
Client waits for RPC-based confirmation from server before completing dspaces_put_sync() call. This
allows more accurate timing information where the transport does not issue a completion event for
remotely triggered reads (i.e. Infiniband.)
3. Build & Install
# Under project top directory
$ make
$ make install
For more information on installing DataSpaces and running the examples, or
if you have any other questions, please visit the following web page:
http://personal.cac.rutgers.edu/TASSL/projects/data/support.html