forked from lattice/quda
-
Notifications
You must be signed in to change notification settings - Fork 1
/
NEWS
222 lines (148 loc) · 8.12 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
Version 0.4.0
- Enabled arbitrary volumes for all Dirac operators.
- Added auto-tuning for the Dirac operators. This is controlled with
a new InvertParam member, "dirac_tune". For multiple inversions,
this tuning can be remembered between solves by setting
"preserve_dirac".
- Added debug compile options to make.inc.example to allow for easier
debugging.
- Added CPU versions of all blas functions and added unit tests for
each one to tests/blas_test.cu.
- Added support for using multiple GPUs in parallel via MPI or QMP.
[TODO: details]
Version 0.3.2 - 18 January 2011
- Fixed a regression in 0.3.1 that prevented the BiCGstab solver from
working correctly with half precision on Fermi.
Version 0.3.1 - 22 December 2010
- Added support for domain wall fermions. The length of the fifth
dimension and the domain wall height are set via the 'Ls' and 'm5'
members of QudaInvertParam. Note that the convention is to include
the minus sign in m5 (e.g., m5 = -1.8 would be a typical value).
- Added support for twisted mass fermions. The twisted mass parameter
and flavor are set via the 'mu' and 'twist_flavor' members of
QudaInvertParam. Similar to clover fermions, both symmetric and
asymmetric even/odd preconditioning are supported. The symmetric
case is better optimized and generally also exhibits faster
convergence.
- Improved performance in several of the BLAS routines, particularly
on Fermi.
- Improved performance in the CG solver for Wilson-like (and domain
wall) fermions by avoiding unnecessary allocation and deallocation
of temporaries, at the expense of increased memory usage. This will
be improved in a future release.
- Enabled optional building of Dirac operators, set in make.inc, to
keep build time in check.
- Added declaration for MatDagMatQuda() to the quda.h header file and
removed the non-existent functions MatPCQuda() and
MatPCDagMatPCQuda(). The latter two functions have been absorbed
into MatQuda() and MatDagMatQuda(), respectively, since
preconditioning may be selected via the solution_type member of
QudaInvertParam.
- Fixed a bug in the Wilson and Wilson-clover Dirac operators that
prevented the use of MatPC solution types.
- Fixed a bug in the Wilson and Wilson-clover Dirac operators that
would cause a crash when QUDA_MASS_NORMALIZATION is used.
- Fixed an allocation bug in the Wilson and Wilson-clover
Dirac operators that might have led to undefined behavior for
non-zero padding.
- Fixed a bug in blas_test that might have led to incorrect autotuning
for the copyCuda() routine.
- Various internal changes: removed temporary cudaColorSpinorField
argument to solver functions; modified blas functions to use C++
complex<double> type instead of cuDoubleComplex type; improved code
hygiene by ensuring that all textures are bound in dslash_quda.cu
and unbound after kernel execution; etc.
Version 0.3.0 - 1 October 2010
- CUDA 3.0 or later is now required to build the library.
- Several changes have been made to the interface that require setting
new parameters in QudaInvertParam and QudaGaugeParam. See below for
details.
- The internals of QUDA have been significantly restructured to facilitate
future extensions. This is an ongoing process and will continue
through the next several releases.
- The inverters might require more device memory than they did before.
This will be corrected in a future release.
- The CG inverter now supports improved staggered fermions (asqtad or
HISQ). Code has also been added for asqtad link fattening, the asqtad
fermion force, and the one-loop improved Symanzik gauge force, but
these are not yet exposed through the interface in a consistent way.
- A multi-shift CG solver for improved staggered fermions has been
added, callable via invertMultiShiftQuda(). This function does not
yet support Wilson or Wilson-clover.
- It is no longer possible to mix different precisions for the
spinors, gauge field, and clover term (where applicable). In other
words, it is required that the 'cuda_prec' member of QudaGaugeParam
match both the 'cuda_prec' and 'clover_cuda_prec' members of
QudaInvertParam, and likewise for the "sloppy" variants. This
change has greatly reduced the time and memory required to build the
library.
- Added 'solve_type' to QudaInvertParam. This determines how the linear
system is solved, in contrast to solution_type which determines what
system is being solved. When using the CG inverter, solve_type should
generally be set to 'QUDA_NORMEQ_PC_SOLVE', which will solve the
even/odd-preconditioned normal equations via CGNR. (The full
solution will be reconstructed if necessary based on solution_type.)
For BiCGstab, 'QUDA_DIRECT_PC_SOLVE' is generally best. These choices
correspond to what was done by default in earlier versions of QUDA.
- Added 'dagger' option to QudaInvertParam. If 'dagger' is set to
QUDA_DAG_YES, then the matrices appearing in the chosen solution_type
will be conjugated when determining the system to be solved by
invertQuda() or invertMultiShiftQuda(). This option must also be set
(typically to QUDA_DAG_NO) before calling dslashQuda(), MatPCQuda(),
MatPCDagMatPCQuda(), or MatQuda().
- Eliminated 'dagger' argument to dslashQuda(), MatPCQuda(), and MatQuda()
in favor of the new 'dagger' member of QudaInvertParam described above.
- Removed the unused blockDim and blockDim_sloppy members from
QudaInvertParam.
- Added 'type' parameter to QudaGaugeParam. For Wilson or Wilson-clover,
this should be set to QUDA_WILSON_LINKS.
- The dslashQuda() function now takes takes an argument of type
QudaParityType to determine the parity (even or odd) of the output
spinor. This was previously specified by an integer.
- Added support for loading all elements of the gauge field matrices,
without SU(3) reconstruction. Set the 'reconstruct' member of
QudaGaugeParam to 'RECONSTRUCT_NO' to select this option, but note
that it should not be combined with half precision unless the
elements of the gauge matrices are bounded by 1. This restriction
will be removed in a future release.
- Renamed dslash_test to wilson_dslash_test, renamed invert_test to
wilson_invert_test, and added staggered variants of these test
programs.
- Improved performance of the half-precision Wilson Dslash.
- Temporarily removed 3D Wilson Dslash.
- Added an 'OS' option to make.inc.example, to simplify compiling for
Mac OS X.
Version 0.2.5 - 24 June 2010
- Fixed regression in 0.2.4 that prevented the library from compiling
when GPU_ARCH was set to sm_10, sm_11, or sm_12.
Version 0.2.4 - 22 June 2010
- Added initial support for CUDA 3.x and Fermi (not yet optimized).
- Incorporated look-ahead strategy to increase stability of the BiCGstab
inverter.
- Added definition of QUDA_VERSION to quda.h. This is an integer with
two digits for each of the major, minor, and subminor version
numbers. For example, QUDA_VERSION is 000204 for this release.
Version 0.2.3 - 2 June 2010
- Further improved performance of the blas routines.
- Added 3D Wilson Dslash in anticipation of temporal preconditioning.
Version 0.2.2 - 16 February 2010
- Fixed a bug that prevented reductions (and hence the inverter) from working
correctly in emulation mode.
Version 0.2.1 - 8 February 2010
- Fixed a bug that would sometimes cause the inverter to fail when spinor
padding is enabled.
- Significantly improved performance of the blas routines.
Version 0.2 - 16 December 2009
- Introduced new interface functions newQudaGaugeParam() and
newQudaInvertParam() to allow for enhanced error checking. See
invert_test for an example of their use.
- Added auto-tuning blas to improve performance (see README for details).
- Improved stability of the half precision 8-parameter SU(3)
reconstruction (with thanks to Guochun Shi).
- Cleaned up the invert_test example to remove unnecessary dependencies.
- Fixed bug affecting saveGaugeQuda() that caused su3_test to fail.
- Tuned parameters to improve performance of the half-precision clover
Dslash on sm_13 hardware.
- Formally adopted the MIT/X11 license.
Version 0.1 - 17 November 2009
- Initial public release.