Releases: diku-dk/futhark
0.18.3
0.18.2
Added
-
The GPU loop tiler can now handle loops where only a subset of the
input arrays are tiled. Matrix-vector multiplication is one
important program where this helps (#1145). -
The number of threads used by the
multicore
backend is now
configurable (--num-threads
and
futhark_context_config_set_num_threads()
). (#1162)
Fixed
-
PyOpenCL backend would mistakenly still streat entry point
argument sizes as 32 bit. -
Warnings are now reported even for programs with type errors.
-
Multicore backend now works properly for very large iteration
spaces. -
A few internal generated functions (
init_constants()
,
free_constants()
) were mistakenly declared non-static. -
Process exit code is now nonzero when compiler bugs and
limitations are encountered. -
Multicore backend crashed on
reduce_by_index
with nonempty target
and empty input. -
Fixed a flattening issue for certain complex
map
nestings
(#1168). -
Made API function
futhark_context_clear_caches()
thread safe
(#1169). -
API functions for freeing opaque objects are now thread-safe
(#1169). -
Tools such as
futhark dataset
no longer crash with an internal
error if writing to a broken pipe (but they will return a nonzero
exit code). -
Defunctionalisation had a name shadowing issue that would crop up
for programs making very advanced use of functional
representations (#1174). -
Type checker erroneously permitted pattern-matching on string
literals (this would fail later in the compiler). -
New coverage checker for pattern matching, which is more correct.
However, it may not provide quite as nice counter-examples
(#1134). -
Fix rare internalisation error (#1177).
0.16.5
0.18.1
0.17.3
Added
- Improved parallelisation of
futhark bench
compilation.
Fixed
-
Dataset generation for test programs now use the right
futhark
executable (#1133). -
Really fix NaN comparisons in interpreter (#1070, again).
-
Fix entry points with a parameter that is a sum type where
multiple constructors contain arrays of the same statically known
size. -
Fix in monomorphisation of types with constant sizes.
-
Fix in in-place lowering (#1142).
-
Fix tiling inside multiple nested loops (#1143).
0.17.2
Added
-
Obscure loop optimisation (#1110).
-
Faster matrix transposition in C backend.
-
Library code generated with CUDA backend can now be called from
multiple threads. -
Better optimisation of concatenations of array literals and
replicates. -
Array creation C API functions now accept
const
pointers. -
Arrays can now be indexed (but not sliced) with any signed integer
type (#1122). -
Added --list-devices command to OpenCL binaries (#1131)
-
Added --help command to C, CUDA and OpenCL binaries (#1131)
Removed
-
The integer modules no longer contain
iota
andreplicate
functions. The top-level ones still exist. -
The
size
module type has been removed from the prelude.
Changed
- Range literals may no longer be produced from unsigned integers.
Fixed
-
Entry points with names that are not valid C (or Python)
identifiers are now pointed out as problematic, rather than
generating invalid C code. -
Exotic tiling bug (#1112).
-
Missing synchronisation for in-place updates at group level.
-
Fixed (in a hacky way) an issue where
reduce_by_index
would use
too much local memory on AMD GPUs when using the OpenCL backend.
0.16.4
Added
-
#[unroll]
attribute. -
Better error message when writing
a[i][j]
(#1095). -
Better error message when missing "in" (#1091).
Fixed
-
Fixed compiler crash on certain patterns of nested parallelism
(#1068, #1069). -
NaN comparisons are now done properly in interpreter (#1070).
-
Fix incorrect movement of array indexing into branches
if
s
(#1073). -
Fix defunctorisation bug (#1088).
-
Fix issue where loop tiling might generate out-of-bounds reads
(#1094). -
Scans of empty arrays no longer result in out-of-bounds memory
reads. -
Fix yet another defunctionalisation bug due to missing
eta-expansion (#1100).
0.16.3
Added
-
random
input blocks forfuthark test
andfuthark bench
now
support floating-point literals, which must always have either an
f32
orf64
suffix. -
The
cuda
backend now supports the-d
option for executables. -
The integer modules now contain a
ctz
function for counting
trailing zeroes.
Fixed
-
The
pyopencl
backend now works with OpenCL devices that have
multiple types (most importantly, oclgrind). -
Fix barrier divergence when generating code for group-level
colletive copies in GPU backend. -
Intra-group flattening now looks properly inside of branches.
-
Intra-group flattened code versions are no longer used when the
resulting workgroups would have less than 32 threads (with default
thresholds anyway) (#1064).
0.16.2
Added
futhark autotune
: added--pass-option
.
Fixed
-
futhark bench
: progress bar now correct when number of runs is
less than 10 (#1050). -
Aliases of arguments passed for consuming parameters are now
properly checked (#1053). -
When using a GPU backend, errors are now properly cleared.
Previously, once e.g. an out-of-bounds error had occurred, all
future operations would fail with the same error. -
Size-coercing a transposed array no longer leads to invalid code
generation (#1054).
0.16.1
Added
-
Incremental flattening is now performed by default. Use
attributes to constrain and direct the flattening if you have
exotic needs. This will likely need further iteration and
refinement. -
Better code generation for
reverse
(and the equivalent explicit
slice). -
futhark bench
now prints progress bars. -
The
cuda
backend now supports similar profiling as theopencl
option, although it is likely slightly less accurate in the
presence of concurrent operations. -
A preprocessor macro
FUTHARK_BACKEND_foo
is now defined in
generated header files, where foo is the name of the backend
used. -
Non-inlined functions (via
#[noinline]
) are now supported in GPU
code, but only for functions that exclusively operate on
scalars. -
futhark repl
now accepts a command line argument to load a
program initially. -
Attributes are now also permitted on declarations and specs.
-
futhark repl
now has a:nanbreak
command (#839).
Removed
-
The C# backend has been removed (#984).
-
The
unsafe
keyword has been removed. Use#[unsafe]
instead.
Changed
-
Out-of-bounds literals are now an error rather than a warning.
-
Type ascriptions on entry points now always result in opaque types
when the underlying concrete type is a tuple (#1048).
Fixed
-
Fix bug in slice simplification (#992).
-
Fixed a typer checker bug for tracking the aliases of closures
(#995). -
Fixed handling of dumb terminals in futhark test (#1000).
-
Fixed exotic monomorphisation case involving lifted type
parameters instantiated with functions that take named parameters
(#1026). -
Further tightening of the causality restriction (#1042).
-
Fixed alias tracking for right-operand operator sections (#1043).