-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathQuick_Start.txt
457 lines (359 loc) · 21.1 KB
/
Quick_Start.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
CoMet Quick Start Guide
-----------------------
1. How to build
----------------
The following is an example of how to build CoMet on the OLCF Summit system:
export OLCF_PROJECT=stf006 #---replace stf006 with your OLCF account id.
cd $MEMBERWORK/$OLCF_PROJECT
mkdir comet_work
cd comet_work
module load git
git clone https://code.ornl.gov/wjd/genomics_gpu.git
# OPTIONAL:
# export COMET_REPO_DIR=$PWD/genomics_gpu
# export COMET_BUILDS_DIR=$PWD
# export COMET_INSTALLS_DIR=$PWD/installs
./genomics_gpu/scripts/configure_all.sh
./genomics_gpu/scripts/make_all.sh
One can also optionally run the tester for this case as follows:
OLCF_PROJECT=stf006 #---replace stf006 with your OLCF account id.
bsub -P $OLCF_PROJECT -Is -nnodes 2 -W 120 -alloc_flags gpumps $SHELL
cd $MEMBERWORK/$OLCF_PROJECT/comet_work
./genomics_gpu/scripts/test_all.sh
NOTES:
- by default, the build process will build several code versions
(single vs. double precision, release vs. test/debug version, non-MPI vs. MPI
version).
- The choice of single vs. double precision impacts what form of arithmetic
is used for the Czekanowski method.
- Using single precision for CCC and DUO has little impact on the numerics
of the calculation but enables certain performance optimizations
such as metrics compression.
- The code uses an out-of-tree build system. By default the build directory
is placed in the working diretory where the configure and build scripts are run.
2. Methods
-----------
CoMet computes comparisons of all pairs (or triples) of vectors in a given
set of vectors in order to identify correlations between similar vectors.
The structure of the computation is similar to a large distributed symmetric
matrix-matrix product (2-way) or tensor product (3-way). It supports several
comparison metrics:
- the Proportional Similarity (Czekanowski) metric, which takes real-valued
vectors and single real-valued number for each vector comparison;
- the CCC and DUO metrics, whose imputs are vectors of 2-bit entries and
whose outputs are a 2X2 or 2X2X2 table of real-valued entries for each
vector comparison.
CoMet is parallelized using MPI and accelerated on modern GPUs. It supports
features such as staging to compute partial results over a series of runs,
thresholding to write results only for highly correlated vectors, and support
for problems with incomplete data.
For further information please refer to the following:
W. Joubert, J. Nance, D. Weighill, D. Jacobson,
"Parallel Accelerated Vector Similarity Calculations for Genomics Applications,"
Parallel Computing, vol. 75, July 2018, pp. 130-145,
https://www.sciencedirect.com/science/article/pii/S016781911830084X,
https://arxiv.org/abs/1705.08210.
W. Joubert, J. Nance, S. Climer, D. Weighill, D. Jacobson,
"Parallel Accelerated Custom Correlation Coefficient Calculations
for Genomics Applications," Parallel Computing 84 (2019), 15-23,
https://www.sciencedirect.com/science/article/pii/S0167819118301431,
https://arxiv.org/abs/1705.08213
Wayne Joubert, Deborah Weighill, David Kainer, Sharlee Climer, Amy Justice,
Kjiersten Fagnan, Daniel Jacobson, "Attacking the Opioid Epidemic:
Determining the Epistatic and Pleiotropic Genetic Architectures
for Chronic Pain and Opioid Addiction," SC18 Gordon Bell Award paper,
https://dl.acm.org/citation.cfm?id=3291732
"GPU-enabled comparative genomics calculations on leadership-class HPC
systems," http://on-demand.gputechconf.com/gtc/2017/presentation/s7156-wayne-joubert-comparative.pdf
"CoMet: An HPC application for comparative genomics calculations,"
https://www.olcf.ornl.gov/wp-content/uploads/2017/11/2018UM-Day1-Joubert.pdf
3. Options
-----------
genomics_metric: calculation of comparison metrics from genomics data
Usage:
genomics_metric <option> ...
Options:
--num_field <value>
the total number of elements in each vector. Either num_field or
num_field_local must be specified.
--num_field_local <value>
the number of elements in each vector on each process (MPI rank).
Either num_field or num_field_local must be specified.
--num_vector <value>
the total number of vectors. Either num_vector or num_vector_local
must be specified.
--num_vector_local <value>
the number of vectors on each process (MPI rank). Either num_vector
or num_vector_local must be specified.
--metric_type <value>
metric type to compute (czekanowski=Czekanowski (default),
ccc=CCC, duo=DUO)
--ccc_multiplier <value>
front multiplier value used to calculate the CCC metric
(default floating point value is 4.5 for CCC).
--duo_multiplier <value>
front multiplier value used to calculate the DUO metric
(default floating point value is 4.0 for DUO).
--ccc_param <value>
fixed coefficient value used to calculate the CCC or DUO metric
(default floating point value is 2/3).
--sparse <value>
for CCC and DUO metric, interpret each vector entry set to binary
"10" as a missing data element (yes=yes, no=no (default))
--num_way <value>
dimension of metric to compute (2=2-way (default), 3=3-way)
--all2all <value>
whether to perform global all-to-all rather than computing
on each processor separately (yes=yes, no=no (default))
--compute_method <value>
manner of computing the result (CPU=cpu, GPU=gpu (default),
REF=reference implementation (slower, computed on CPU))
--tc <value>
for CCC and DUO, perform computation using a standard GEMM computation
that employs special hardware such as GPU tensor cores when available
(0=no (default), 1=fp16/fp32, 2=int8/int32, 3=fp32, 4=auto,
5=int1/int32, 6=int4/int32)
--num_tc_steps <value>
for tc methods, tuning parameter to reduce memory usage
by breaking GEMM into multiple steps (default 1)
--num_proc_vector <value>
blocking factor to denote number of blocks used to decompose
the total number of vectors across processes (MPI ranks)
(default is the total number of procs requested)
--num_proc_field <value>
blocking factor to denote number of blocks used to decompose
each vector across process (MPI rank) (default is 1)
--num_proc_repl <value>
process replication factor. For each block along the vector
and field axes, this number of processes (MPI ranks) is applied to
computations for the block (default is 1)
--num_stage <value>
the number of stages the computation is divided into, for breaking
the run campaign into smaller parts and reducing the memory footprint
(default is 1) (available for 3-way case only)
--stage_min <value>
the lowest stage number of the sequence of stages to be computed
for this run (0-based, default is 0)
--stage_max <value>
the highest stage number of the sequence of stages to be computed
for this run (0-based, default is num_stage-1)
--num_phase <value>
the number of phases the computation is divided into, for breaking
the run campaign into smaller parts and reducing the memory footprint
(default is 1)
--phase_min <value>
the lowest phase number of the sequence of phases to be computed
for this run (0-based, default is 0)
--phase_max <value>
the highest phase number of the sequence of phases to be computed
for this run (0-based, default is num_phase-1)
--input_file <value>
string denoting the filename or file pathname of binary file
containing all input vectors. If this option not present,
a synthetic test case is run.
--problem_type <value>
the kind of synthetic test case to run. Allowed choices are
analytic (default) or random.
--output_file_stub <value>
string denoting the filename or pathname stub of files
used to store result metrics. Metric values are written to files
whose names are formed by appending a unique identifier
(e.g., process number) to the end of this string. If this
option is absent, no output files are written.
--histograms_file <value>
string denoting the filename or pathname of text file
used to store histograms of metrics entries, used for scoping runs
to determine metric value thresholds. Note all computed metrics
entries are histogrammed irrespective of thresholding. Only available
for CCC and DUO metrics.
--threshold <value>
output each metric result value only if its magnitude is greater than
this threshold. If set negative, no thresholding is done
(default -1)
--threshold <valueLL>,<valueLH>,<valueHH>,<valueLLHH>
--threshold <valueLLL>,<valueLLH>,<valueLHH>,<valueHHH>,<valueLLLHHH>
alternate threshold option for CCC and DUO methods (2-way, 3-way forms,
respectively), used to specify individual thresholds for different
table entries. For example, for 3-way, <valueLLH> is the threshold
for table entries (0,0,1) and also table entries with equivalent index
permutations (0,1,0), (1,0,0). Setting <valueLLHH> causes the
additional output of table entries (0,0) and (1,1) (output as individual
values) if both entries are positive and if both summed together
exceed this threshold; <valueLLLHHH> is analogous for 3-way.
All thresholds can be disabled by being set negative; otherwise
all thresholds must be nonnegative.
--metrics_shrink <value>
anticipated reduction factor in the number of metric entries
stored due to thresholding, used to reduce CPU memory footprint
to allow larger problems to be solved. For example, a value of
10 specifies that memory will be allocated assuming no more than
1/10 of metric entries pass threshold for any stage, phase or
process. (default 1.0)
--checksum <value>
compute checksum of the metrics results that pass threshold
(yes=yes (default), no=no)
--verbosity <value>
verbosity level of output (0=none, 1=some (default) 2,3=more)
4. File Formats
----------------
All CoMet I/O makes use of binary files for speed and ease of indexing.
See the tools/ directory of the CoMet repository for tools to convert
between binary and human-readable text formats.
For both input and output, the endianness of integer and floating point
values in the files matches that of the system on which the code is run.
4.1 Input File Formats
-----------------------
CoMet input is stored as a single binary file; each process reads the part
of the file it needs. Elements of the matrix of vectors are stored in
lexicographical order in the file, with the field dimension varyng
fastest. Raw values are stored in a packed fashion with no indexing
data; dimensions are supplied via command line arguments.
For the Czekanowski metric, values are stored packed in sequence as
4-byte floats or 8-byte doubles, depending on the code version being
used.
CCC/DUO 2-bit values are packed 4 per byte, starting at the least significant
bit of the byte. For the 2 bits, the higher order bit is considered the
"first" bit, the lower order bit is considered to be "second," thus
for "01" or "(0,1)" the first but is "0" and the second bit is "1".
Note for the sparse case "(1,0)" is the marker for a missing vector entry.
The last byte of each vector is padded with zeros for the high-order bits
before the next vector in the file is started.
4.2 Output File Formats
------------------------
Output files are written one file per process. The files are written
in binary format as a packed series of results, written in no particular
order. Each value is stored as two (for 2-way) or three (for 3-way)
4-byte unsigned integer indices, followed by a 4-byte floating point
metric value.
For the Czekanowski metric, each integer index denotes the (0-based) vector
number relevant to the metric value.
For CCC and DUO, the lowest order bit of the integer denotes
the 0/1 index into the relevant 2X2 (or 2X2X2) table entry being written,
and all other bits of the integer denote the (0-based) vector number.
5. Execution Examples
----------------------
# The following test runs assume:
OLCF_PROJECT=stf006 #---replace stf006 with your OLCF account id.
bsub -P $OLCF_PROJECT -Is -nnodes 2 -W 120 $SHELL
cd $MEMBERWORK/$OLCF_PROJECT/comet_work
# The following code also assumes bash shell.
#--------------------
# Small case, synthetic test problem, 2-way Czekanowski metric.
#--------------------
AR_PERFORMANCE_FLAGS="PAMI_IBV_ENABLE_DCT=1 PAMI_ENABLE_STRIPING=1 PAMI_IBV_ADAPTER_AFFINITY=0 PAMI_IBV_QP_SERVICE_LEVEL=8 PAMI_IBV_ENABLE_OOO_AR=1"
EXECUTABLE="./install_release_summit/bin/genomics_metric"
env OMP_NUM_THREADS=7 $AR_PERFORMANCE_FLAGS \
jsrun --nrs 1 --rs_per_host 1 \
--tasks_per_rs 1 --cpu_per_rs 7 --bind packed:7 --gpu_per_rs 1 -X 1 \
$EXECUTABLE \
--num_field 2 --num_vector 4 --num_proc_vector 1 \
--metric_type czekanowski --num_way 2 \
--compute_method GPU --all2all yes --verbosity 3
vec_proc 0 vec 0 field_proc 0 field 0 value 1.000000e+00
vec_proc 0 vec 0 field_proc 0 field 1 value 2.000000e+00
vec_proc 0 vec 1 field_proc 0 field 0 value 3.000000e+00
vec_proc 0 vec 1 field_proc 0 field 1 value 4.000000e+00
vec_proc 0 vec 2 field_proc 0 field 0 value 2.000000e+00
vec_proc 0 vec 2 field_proc 0 field 1 value 1.000000e+00
vec_proc 0 vec 3 field_proc 0 field 0 value 4.000000e+00
vec_proc 0 vec 3 field_proc 0 field 1 value 3.000000e+00
element (0,1): value: 5.99999999999999978e-01
element (0,2): value: 6.66666666666666630e-01
element (1,2): value: 5.99999999999999978e-01
element (0,3): value: 5.99999999999999978e-01
element (1,3): value: 8.57142857142857095e-01
element (2,3): value: 5.99999999999999978e-01
metrics checksum 0-82898256547-645082804690974176 ctime 0.003131 ops 8.000000e+01 ops_rate 2.554971e+04 ops_rate/proc 2.554971e+04 vcmp 6.000000e+00 cmp 1.200000e+01 ecmp 1.200000e+01 ecmp_rate 3.832456e+03 ecmp_rate/proc 3.832456e+03 vctime 0.000022 mctime 0.000023 cktime 0.000039 intime 0.000069 outtime 0.000044 cpumem 6.720000e+02 gpumem 4.480000e+02 tottime 0.003433
#--------------------
# Larger case, multiple GPUs.
#--------------------
env OMP_NUM_THREADS=7 $AR_PERFORMANCE_FLAGS \
jsrun --nrs 12 --rs_per_host 6 \
--tasks_per_rs 1 --cpu_per_rs 7 --bind packed:7 --gpu_per_rs 1 -X 1 \
$EXECUTABLE \
--num_field 20000 --num_vector 150000 --num_proc_vector 10 \
--metric_type czekanowski --num_way 2 \
--compute_method GPU --all2all yes --verbosity 1
metrics checksum 95-731337953569731265-108597638988176688 ctime 27.617216 ops 4.950330e+14 ops_rate 1.792480e+13 ops_rate/proc 1.792480e+12 vcmp 1.124992e+10 cmp 2.249985e+14 ecmp 2.249985e+14 ecmp_rate 8.147038e+12 ecmp_rate/proc 8.147038e+11 vctime 0.015438 mctime 0.676193 cktime 50.526713 intime 1.086587 outtime 0.000027 cpumem 3.300012e+10 gpumem 1.080000e+10 tottime 79.922458
#--------------------
# 3-way Czekanowski metric, small case.
#--------------------
env OMP_NUM_THREADS=7 $AR_PERFORMANCE_FLAGS \
jsrun --nrs 1 --rs_per_host 1 \
--tasks_per_rs 1 --cpu_per_rs 7 --bind packed:7 --gpu_per_rs 1 -X 1 \
$EXECUTABLE \
--num_field 2 --num_vector 4 --num_proc_vector 1 \
--metric_type czekanowski --num_way 3 \
--compute_method GPU --all2all yes --verbosity 3
vec_proc 0 vec 0 field_proc 0 field 0 value 1.000000e+00
vec_proc 0 vec 0 field_proc 0 field 1 value 2.000000e+00
vec_proc 0 vec 1 field_proc 0 field 0 value 3.000000e+00
vec_proc 0 vec 1 field_proc 0 field 1 value 4.000000e+00
vec_proc 0 vec 2 field_proc 0 field 0 value 2.000000e+00
vec_proc 0 vec 2 field_proc 0 field 1 value 1.000000e+00
vec_proc 0 vec 3 field_proc 0 field 0 value 4.000000e+00
vec_proc 0 vec 3 field_proc 0 field 1 value 3.000000e+00
element (0,1,2): value: 6.92307692307692291e-01
element (0,1,3): value: 7.94117647058823484e-01
element (0,2,3): value: 6.92307692307692291e-01
element (1,2,3): value: 7.94117647058823484e-01
metrics checksum 0-84749404949-5538720434861296 ctime 0.003335 ops 1.280000e+02 ops_rate 3.838082e+04 ops_rate/proc 3.838082e+04 vcmp 4.000000e+00 cmp 8.000000e+00 ecmp 8.000000e+00 ecmp_rate 2.398801e+03 ecmp_rate/proc 2.398801e+03 vctime 0.000021 mctime 0.000037 cktime 0.000044 intime 0.000066 outtime 0.000040 cpumem 1.440000e+03 gpumem 9.600000e+02 tottime 0.003641
#--------------------
# 2-way CCC metric, small case.
#--------------------
env OMP_NUM_THREADS=7 $AR_PERFORMANCE_FLAGS \
jsrun --nrs 1 --rs_per_host 1 \
--tasks_per_rs 1 --cpu_per_rs 7 --bind packed:7 --gpu_per_rs 1 -X 1 \
$EXECUTABLE \
--num_field 2 --num_vector 4 --num_proc_vector 1 \
--metric_type ccc --num_way 2 \
--compute_method GPU --all2all yes --verbosity 3
vec_proc 0 vec 0 field_proc 0 field 0 value 00
vec_proc 0 vec 0 field_proc 0 field 1 value 01
vec_proc 0 vec 1 field_proc 0 field 0 value 10
vec_proc 0 vec 1 field_proc 0 field 1 value 11
vec_proc 0 vec 2 field_proc 0 field 0 value 01
vec_proc 0 vec 2 field_proc 0 field 1 value 00
vec_proc 0 vec 3 field_proc 0 field 0 value 11
vec_proc 0 vec 3 field_proc 0 field 1 value 10
element (0,1): values: 0 0 4.68750000000000000e-01 0 1 5.62500000000000000e-01 1 0 0.00000000000000000e+00 1 1 4.68750000000000000e-01
element (0,2): values: 0 0 5.62500000000000000e-01 0 1 4.68750000000000000e-01 1 0 4.68750000000000000e-01 1 1 0.00000000000000000e+00
element (1,2): values: 0 0 2.34375000000000000e-01 0 1 3.90625000000000000e-01 1 0 7.03125000000000000e-01 1 1 2.34375000000000000e-01
element (0,3): values: 0 0 2.34375000000000000e-01 0 1 7.03125000000000000e-01 1 0 3.90625000000000000e-01 1 1 2.34375000000000000e-01
element (1,3): values: 0 0 0.00000000000000000e+00 0 1 4.68750000000000000e-01 1 0 4.68750000000000000e-01 1 1 5.62500000000000000e-01
element (2,3): values: 0 0 4.68750000000000000e-01 0 1 5.62500000000000000e-01 1 0 0.00000000000000000e+00 1 1 4.68750000000000000e-01
metrics checksum 0-245201878478-801640733671948288 ctime 0.003012 ops 0.000000e+00 ops_rate 0.000000e+00 ops_rate/proc 0.000000e+00 vcmp 6.000000e+00 cmp 4.800000e+01 ecmp 1.200000e+01 ecmp_rate 3.984141e+03 ecmp_rate/proc 3.984141e+03 vctime 0.000022 mctime 0.000021 cktime 0.000030 intime 0.000071 outtime 0.000115 cpumem 1.024000e+03 gpumem 7.040000e+02 tottime 0.003367
#--------------------
# Larger case, using tensor cores.
#--------------------
env OMP_NUM_THREADS=7 $AR_PERFORMANCE_FLAGS \
jsrun --nrs 12 --rs_per_host 6 \
--tasks_per_rs 1 --cpu_per_rs 7 --bind packed:7 --gpu_per_rs 1 -X 1 \
$EXECUTABLE \
--num_field 400000 --num_vector 100000 --num_proc_vector 10 \
--metric_type ccc --num_way 2 \
--compute_method GPU --tc 1 --num_tc_steps 4 --all2all yes --verbosity 1
metrics checksum 138-217645116196906322-630503947831869440 ctime 23.484884 ops 1.760000e+16 ops_rate 7.494182e+14 ops_rate/proc 7.494182e+13 vcmp 4.999950e+09 cmp 7.999920e+15 ecmp 1.999980e+15 ecmp_rate 8.516031e+13 ecmp_rate/proc 8.516031e+12 vctime 0.209915 mctime 0.434914 cktime 150.661019 intime 15.280343 outtime 0.000036 cpumem 2.480000e+10 gpumem 1.420256e+10 tottime 190.071566
#--------------------
# 3-way CCC metric, small case.
#--------------------
env OMP_NUM_THREADS=7 $AR_PERFORMANCE_FLAGS \
jsrun --nrs 1 --rs_per_host 1 \
--tasks_per_rs 1 --cpu_per_rs 7 --bind packed:7 --gpu_per_rs 1 -X 1 \
$EXECUTABLE \
--num_field 2 --num_vector 4 --num_proc_vector 1 \
--metric_type ccc --num_way 3 \
--compute_method GPU --all2all yes --verbosity 3
vec_proc 0 vec 0 field_proc 0 field 0 value 00
vec_proc 0 vec 0 field_proc 0 field 1 value 01
vec_proc 0 vec 1 field_proc 0 field 0 value 10
vec_proc 0 vec 1 field_proc 0 field 1 value 11
vec_proc 0 vec 2 field_proc 0 field 0 value 01
vec_proc 0 vec 2 field_proc 0 field 1 value 00
vec_proc 0 vec 3 field_proc 0 field 0 value 11
vec_proc 0 vec 3 field_proc 0 field 1 value 10
element (0,1,2): values: 0 0 0 1.17187500000000000e-01 0 0 1 1.95312500000000000e-01 0 1 0 2.10937500000000000e-01 0 1 1 1.17187500000000000e-01 1 0 0 0.00000000000000000e+00 1 0 1 0.00000000000000000e+00 1 1 0 2.34375000000000000e-01 1 1 1 0.00000000000000000e+00
element (0,1,3): values: 0 0 0 0.00000000000000000e+00 0 0 1 2.34375000000000000e-01 0 1 0 1.17187500000000000e-01 0 1 1 2.10937500000000000e-01 1 0 0 0.00000000000000000e+00 1 0 1 0.00000000000000000e+00 1 1 0 1.95312500000000000e-01 1 1 1 1.17187500000000000e-01
element (0,2,3): values: 0 0 0 1.17187500000000000e-01 0 0 1 2.10937500000000000e-01 0 1 0 0.00000000000000000e+00 0 1 1 2.34375000000000000e-01 1 0 0 1.95312500000000000e-01 1 0 1 1.17187500000000000e-01 1 1 0 0.00000000000000000e+00 1 1 1 0.00000000000000000e+00
element (1,2,3): values: 0 0 0 0.00000000000000000e+00 0 0 1 1.17187500000000000e-01 0 1 0 0.00000000000000000e+00 0 1 1 1.95312500000000000e-01 1 0 0 2.34375000000000000e-01 1 0 1 2.10937500000000000e-01 1 1 0 0.00000000000000000e+00 1 1 1 1.17187500000000000e-01
metrics checksum 0-20352394550-918734323983581184 ctime 0.003430 ops 0.000000e+00 ops_rate 0.000000e+00 ops_rate/proc 0.000000e+00 vcmp 4.000000e+00 cmp 6.400000e+01 ecmp 8.000000e+00 ecmp_rate 2.332274e+03 ecmp_rate/proc 2.332274e+03 vctime 0.000021 mctime 0.000041 cktime 0.000054 intime 0.000062 outtime 0.000133 cpumem 1.472000e+03 gpumem 8.320000e+02 tottime 0.003840