-
Notifications
You must be signed in to change notification settings - Fork 25
/
Copy pathREADME_KNL.txt
84 lines (51 loc) · 2.98 KB
/
README_KNL.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
Using ExaML on the Intel Xeon Phi (Knights Landing) coprocessors
Compiling under Linux
---------------------
Please set your MPI/MIC environment (ask your sysadmin if unsure) and then run:
cd examl
make -f Makefile.KNL.icc
This will create an executable named examl-KNL.
Running
----------------------
1. Use parse-examl to generate a binary alignment file as usual
2. You can run examl-KNL in OpenMP-only, MPI-only or in hybrid OpenMP/MPI mode
(s. sample commands below). Unlike with KNC, I didn't notice any significant performance
differences between those three configurations, so just choose whatever is easier/more convenient:
OpenMP:
OMP_NUM_THREADS=128 ./examl-KNL -s myTest.binary -m GAMMA -t myStart.tre -n myTest
MPI:
mpirun -n 128 -env OMP_NUM_THREADS 1 ./examl-KNL -s myTest.binary -m GAMMA -t myStart.tre -n myTest
Hybrid:
mpirun -n 8 -env OMP_NUM_THREADS 16 ./examl-KNL -s myTest.binary -m GAMMA -t myStart.tre -n myTest
NOTE: although KNL has 4 logical threads/core, it usually doesn't make sense to use more then 2 threads/core
with ExaML (since ExaML doesn't benefit for hyper-threading). Furthermore, please consider the general
recommendations regarding number of alignment patterns per core given in the ExaML manual.
3. IMPORTANT: KNL on-card memory can be configured in one of two modes: "Flat" or "Cache".
You should find out which one is set on you card(s), since it has important performance implications.
- in "Cache" mode, no
- in "Flat" mode, you should use numactl to explicitly bind ExaML process to the fast memory NUMA domain:
numactl --membind=1 mpirun -n 128 -env OMP_NUM_THREADS 1 ./examl-KNL -s myTest.binary -m GAMMA -t myStart.tre -n myTest
Obviously, it is not possible if memory requirements for your analysis exceed the on-card memory size (typically 16GB).
In this case, you should either switch into "Cache" mode, or let ExaML run in the (slow) main memory. The latter option
will typically induce a huge performance penalty (up to 5x), and thus is not recommended.
Limitations & caveats
---------------------
1. Supported on the MIC:
+ DNA and AA alignments
+ GAMMA model of rate heterogeneity
+ multiple partitions
+ all AA substitution matrices supported by ExaML, including LG4
2. Currently NOT supported:
- binary and generic multi-state alignments
- PSR model
- memory saving for gappy alignments (-S option)
3. Performance
ExaML-MIC performs best on alignments with large number of sites and few taxa.
The latter is due to the limited on-card memory of the MICs (s. above), so you
might need to use multiple cards if the number of taxa is large.
For details, please refer to: http://www.hicomb.org/papers/HICOMB2014-04.pdf and
https://doi.org/10.1093/bioinformatics/btv184
Contact & Support
--------------------
Please use RAxML google group to ask questions:
https://groups.google.com/forum/?hl=en#!forum/raxml