-
Notifications
You must be signed in to change notification settings - Fork 0
/
ChangeLog
181 lines (113 loc) · 5.8 KB
/
ChangeLog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
0.9.0
Making parallelization, tiling, and other transformations work with bands
and poly scattering tree prefixes as opposed to depths. Mark parallel and
vectorizable loops via clast processing instead of scripts. All of this
gets rid of a long-standing limitation.
See git history for all changes.
0.8.1
Minor bug fixes; see git log. Bail out with identity transformation on a
failure.
0.8.0
Make it easier for users to apply their own transformations. Separate
transformation porperty detection from pluto_auto_transform. See git history
for all changes.
0.7.0
From git commit 'Update version to 0.7.0'
8cce20118b895b0ea538df65894fc7c85806d1c9
See git history for a log of changes. Bug fixes, better data structuring, more
modular.
0.6.0 BETA
1. Switch to Cloog-isl (from Cloog-polylib). ISL (Author: Sven Verdoolaege) is
integer-aware for its polyhedral operations unlike polylib, and relies on
different algorithms/machinery for its operations. Cloog with ISL
as backend leads to better code generation and code. (1) Generated code is of much
better quality (simpler bounds, simpler conditionals, shorter cleanup code, no spurious
integer empty domains), (2) no value overflows when handling deep loop nests or large
tile sizes, and (3) faster code generation typically for complex cases (deep loop nests,
complex transformations, large tile sizes and with both L1 and L2 tiling).
2. Pluto core bug fix (bounding function problem in certain cases)
3. Do not put any context on parameters by default (use --context
for minimum value on parameters).
4. Bug fix in orthogonal subspace computation
5. Some new examples
0.5.1 BETA
1. Minor bug fix: Bug in generation of OpenMP pragma in some cases (when the first hyperplane
is a parallel scalar dimension)
2. Minor bug fix: Bug in setting tile sizes from 'tile.sizes' when a permutable band
contains a scalar dimension. No tile size should have been read for the
scalar dimension.
3. Orio bug fix for unroll-jamming non-rectangular code
4. Minor inscop fix
5. Pluto core bug fix (bounding function problem in certain cases)
6. Support for privatization: thanks to Louis-Noel Pouchet; almost
everything is done in Candl
7. Do not put any context on parameters by default (use --context
for minimum value on parameters).
8. Option --nofuse now behaves correctly: distributing at inner levels too
9. Minor bug fix in orthogonal subspace computation
0.5.0 BETA
1. New polyhedral frontend and dependence tester (Clan/Candl) integrated.
All LooPo components are out of Pluto now. Thanks to Clan, affine
conditionals are now handled, besides pure function calls (assumed to be
pure). Speed of the tool has also increased with this due to a much faster
dependence tester. Use #pragma scop and #pragma endscop to demarcate
portions to be extracted and optimized. Parameters are automatically
detected. /* pluto start <parameter list> */ /* pluto end */ are no longer
needed or supported.
2. Command-line options can now be specified at any position, before or after
the file name
3. --indent option to indent output code
4. Several minor bug fixes
0.4.2 BETA
* Minor bug fixes
0.4.1 BETA
* Build issues on Mac OS X resolved
* Moved to Piplib 1.3.7
* Support for Bee+Cl@k tool with option --bee. Pragmas are inserted that the
Bee tool can handle and optimize for storage.
0.4.0 BETA
* Computation of transformations is much faster now (up to 7x in some cases)
due to optimizations in building the formulation, and better memory
allocation. SPECFP2000 swim code can be transformed (end-to-end) in 11s
instead of 41s with transformation computation time reduced to 2.8s from
31.8s.
* Band detection for tiling now more advanced (dependences satisfied at
scalar dimensions are skipped to allow identifying tilable bands across scalar
dimensions: the scalar dimension is not preserved when the band is tiled. See
test/matmul-seq.c or test/doitgen.c tiling for example
* Transformed iterator names changed to t0, t1, t2, ... from c1, c2, ..
* Transformed iterator names are indexed 0 to <num-1> instead of <1> to <num>
* Added plutune script that just steps through the command-line space of Pluto
and runs each version of the code: can be used to figure out the best set of
command-line options for a code. Just run with "./plutune <source.c>"
* Some register tiling bugs fixed (affected a few cases)
0.3.0 BETA
* GCC's preprocessor is now run on the Cloog code to replace statement text
* Ability to complete partially specified transformations through a '.precut'
file. User can himself specify a .precut file if a particular custom fusion
structure or a partial transformation should be used to start from. Pluto can
then complete it; also, allows integration with the Letsee tool which can
enumerate .precut files.
* Inscop fixed so that includes are not inserted as part of the optimized
program section but at the top of the file (Thanks to Piotr and Louis-Noël)
* Bug in computing SCCs fixed
* Bugs in register tiling fixed
* Changes to --maxfuse and --smartfuse
* Unroll and unroll-jam factor can be set with --ufactor=<factor>; 8 is the
default
* Code cleanup (less worse now)
0.2.0 BETA
* Added --smartfuse option (this is the default), --nofuse and --maxfuse
are the other two
* Added 'install'. Installation can now be done with './install'
* Fixed compilation on 32-bit systems due to libpolylib64.a not being built
* PipLib, PolyLib, and Cloog are now included. No need to download or install
them separately
* Fusion algorithm implementation for the general case is now complete (but
still not tuned). Custom fusion structures can be forced with the .fst file
* --nofuse option now to treat all SCCs separately (no fusion across SCCs even
if it is possible)
* Changes to LooPo's C++ sources to make everything compile with GCC 4.3.0
* Lots of other minor fixes
0.1.0 BETA
First release