Skip to content

Releases: StanfordLegion/legion

Version 22.03.0 (March 27, 2022)

28 Mar 21:33
Compare
Choose a tag to compare
  • Build
    • Minimum supported cmake version is now 3.7. (Some optional features continue to require even newer versions.)
  • Realm
    • Numerous bug fixes in the gasnetex network layer
    • CUDA and HIP support allow direct specification of which gpus to use via -ll:gpu_ids command-line option
    • Added support for copy paths using Cuda IPC between gpus on the same physical node
    • For applications using CUDA without the runtime API hijack AND only submitting work to the default CUDA stream, -cuda:legacysync 1 improves the overhead of detecting the completion of device-side work launched by a task
    • Realm reduction copies may now indicate exclusive access to the destination instance, improving performance by allowing simple load/store instead of atomic operations
    • Custom reduction operations (including Legion's built-in ones) can provide HIP implementations, permitting in-place reductions in HIP device memory
  • Regent
    • Support for custom serialization of types in task parameters and results
    • New experimental timing library under std/timing

Version 21.12.0 (December 31, 2021)

05 Jan 17:16
Compare
Choose a tag to compare
  • Realm
    • Performance improvements for multi-dimensional copies, especially
      inter-process transfers
    • Support for loading CUDA driver (if present) at runtime instead of
      link time, allowing same binary to be used on systems with and without
      CUDA-capable GPUs (enabled with -DLegion_CUDA_DYNAMIC_LOAD=ON in
      cmake build)
    • A separate Memory is now created per process for external (system)
      memory instances. This memory has no capacity for creating instances
      and can confuse applications or Legion mappers that assume exactly
      one Memory of kind SYSTEM_MEM exists. Old behavior can be obtained
      with -ll:ext_sysmem 0, but this can fail for configurations that
      register system memory with the network and/or GPUs
    • The MemoryQuery now supports a has_capacity predicate to restrict
      results to just memories with sufficient total (not current!) capacity
      to allocate an instance of a specified size
  • Build
    • Cmake allows control of max nodes (-DLegion_MAX_NUM_NODES=...) and
      max processors/node (-DLegion_MAX_NUM_PROCS=...) supported by
      Legion build
    • Added dependency tracking to make-based builds

Version 21.09.0 (September 28, 2021)

02 Oct 04:20
Compare
Choose a tag to compare
  • Realm
    • Numerous bug fixes in the gasnetex network layer
    • Support for HIP memory type registration with GASNet (with GASNet version 2021.9.0+)
    • Arguments to spawned tasks may now be arbitrarily large (network-specific limits have been eliminated)
  • Regent
    • Improved support for dynamic checks on index launches with potential interference between different region arguments
    • Extensive fixes for separate compilation. This mode has now been verified to work with large-scale applications
    • Removed long-obsolete support for __demand(__external)
  • Pygion
    • Add support for layout constraints

Version 21.06.0 (June 24, 2021)

25 Jun 15:37
Compare
Choose a tag to compare
  • Build
    • Version information is now compiled into Realm and Legion. This takes
      the form of a string (e.g. "legion-21.06.0") rather than anything
      that can be compared (i.e. no semantic versioning here). Compile-time
      defines REALM_VERSION and LEGION_VERSION are available as well as
      run-time calls Realm::Runtime::get_library_version and
      Legion::Runtime::get_library_version.
  • Regent
    • Support for dynamic checks on projection functors, enabling a
      much larger class of loops to be supported as index launches
    • Support for local tasks (i.e., without going through the
      runtime) via __demand(__local)
  • Realm
    • Windows (MSVC) builds are now tested in CI and and therefore more likely
      to work
    • Realm runtime can now be shutdown and reinitialized in the same process.
      (Exception: GASNet-based network layers do not support this.)
    • Registration of host memory with CUDA driver is skipped for host
      memories larger than 1GB by default due to CUDA driver overhead.
      This threshold can be increased (or decreased) with -cuda:hostreg
  • Tools

Version 21.03.0 (March 30, 2021)

30 Mar 20:42
Compare
Choose a tag to compare
  • Build
    • Cmake can build an embedded copy of GASNet as part of the Legion build
      with -DLegion_EMBED_GASNet=ON
  • Regent
    • Contains three breaking changes to the Regent calling convention:
      • Reductions are now aggregated into region requirements and
        sorted by the index of the first field in the field space
        among the set of fields for each reduction.
      • Task arguments may be passed through either args or
        local_args for index launched tasks. (Previously Regent
        only used local_args.)
      • Region values passed via args to an index-launched task may
        be bogus. Instead the region requirement should be used to
        obtain the original region.
    • Support for constant time index launches. These are enabled
      automatically, but can be forced on or off with __demand or
      __forbid with __constant_time_launches. This should
      improve scalability at extreme node counts.
    • Support for rescape and remit to generate metaprogrammed
      code more easily.
    • Experimental support for separate compilation via -fspeparate 1
      allows Regent programs to be compiled in parts (potentially in
      parallel). Note that separate compilation currently cannot be
      used with Bishop and requires one of either parallel or
      incremental compilation if regentlib.start is used (does not
      apply to regentlib.saveobj or regentlib.save_tasks).
  • Legion
    • In the control replication branch users will find a new implementaiton
      of Legion's physical analysis that uses heuristics to select which
      sub-trees should be used for performing the analysis. Disjoint and
      complete partitions are especially helpful in aiding the runtime.
    • There is a new implementation of the index space math inside of the
      runtime that now soundly and precisely detect congruences between
      index space math operations. This fixes a long-running class of bugs
      that would cause memory explosions in the physical analysis.
    • In the control replication branch users can now map future values into
      memories the same as they do with regions. This means that future
      payloads can be placed directly on devices like GPUs. Similarly, the
      runtime now accepts future data from tasks that also reside in any
      memory in the machine including device memories.
    • Both the master and control replication branches have support for
      index space attach operations.
    • Expensive transitive reductions on traces are now computed in the
      background allowing trace replays to begin replaying immediately
      with only partial optimizations.
  • Realm
    • Custom reduction operations (including Legion's built-in ones) can
      provide CUDA implementations, permitting in-place reductions in
      CUDA device memory
    • Support for CUDA managed memory (via -ll:msize) that is coherent for
      both host and device access. Includes support for __managed__
      variables (only single-GPU if using CUDA runtime hijack mode)
    • Event::wait may be called outside of Realm tasks, having the same
      thread-blocking behavior as Event::external_wait
    • Experimental support for AMD HIP. Note that testing coverage is
      incomplete, and breakages may occur in between releases. For more
      details, see: #1028

Version 20.12.0 (December 28, 2020)

01 Jan 15:55
Compare
Choose a tag to compare
  • Build
    • Legion and Realm now require a compiler with (at least) c++11 support
    • Python scripts (e.g. legion_prof and legion_spy) require Python 3.5
  • Realm
    • Improved performance of inter-node instance copies when data is not
      contiguous in source and/or destination
    • Improved responsiveness of utility processors by not using them for
      background work by default
    • Experimental support for building on Windows with MSVC
    • Improved performance (and correctness) when running CUDA tasks without
      the runtime hijack enabled
    • Added gasnetex network layer that uses GASNet-EX's native API (instead
      of the legacy GASNet-1 API support). Requires GASNet version 2020.11.0
      or newer. For more details, see: #986
  • Legion
    • The mapping interface no longer requires the runtime to return valid
      instances for empty regions (e.g. regions with no points their index space)
  • Tools
    • Legion Spy now has support for arbitrary number of dimensions
  • Examples
    • examples/nccl gives a simple example of using NCCL with Legion

Version 20.06.0 (June 29, 2020)

30 Jun 22:47
Compare
Choose a tag to compare
  • Regent
    • Support for std/format module for type-safe formatted printing
    • Support for documentation with LDoc
    • Support for __future operator to import a C API future
  • Legion
    • Support for inlining tasks into leaf contexts
    • Support for global registration callbacks inside of tasks
    • Added semantic tags for source file and line location
    • Support for multi-region accessors for region requirements with
      co-location constraints
    • Changes to semantics of deletion for index spaces, field spaces, and
      logical regions. For details, see: #812
    • Support for creating fields spaces with initial fields
  • Realm
    • Subgraphs can be used to capture a template of Realm operations
      that will be executed repeatedly. Subgraph definitions include
      support for "interpolating" values into individual operations'
      arguments on each instantiation of the subgraph template
    • create_weighted_subspaces supports size_t weights for precise
      control over the size of each subspace
    • Added support for omp critical constructs and dynamic loop
      schedules in OpenMP tasks
    • Added support for cudaStreamLegacy and cudaStreamPerThread in
      CUDA tasks
    • Realm logs now include a timestamp (relative to runtime init)
      by default. This behavior can be disabled with -logtime 0
    • Performance improvements for copies/fills of 3D instances spaces in
      GPU device memory
    • Added ability to compute a set of "covering rectangles" for sparse
      index spaces, allowing more compact representation in memory
    • Added MultiAffineAccessor for accessing compact instances
    • Added ability to delete a ProcessorGroup

Version 20.03.0 (March 31, 2020)

31 Mar 22:54
Compare
Choose a tag to compare
  • Regent
    • Behavior change: __fields and __physical now both require explicit field names, i.e., __fields(r.{x, y}) rather than __fields(r). This makes the behavior more unambiguous and helps to avoid bugs
    • Added complete and incomplete keywords that can be used to mark partitions as such
    • Added support for setting mapper ID and tag via t:set_mapper_id() and t:set_mapping_tag_id()
    • Initial support for predicated execution of if and while statements
    • Fixed several bugs, memory leaks and improved compile times
  • Legion
    • Introduction of Fortran bindings for Legion
    • Support for creating deferred index spaces from future values
    • Support for construction of partitions from a map of domains or from a future map
    • Support for reducing a future map to a single future asynchronously
  • Realm
    • Support for Kokkos parallel launch constructs in Realm (and therefore Legion) tasks. Currently supported Kokkos execution spaces are: Serial, OpenMP, CUDA. Application data remains in logical regions, but accessors can be converted to Kokkos (unmanaged) Views if needed. See the kokkos_interop example
    • Introduction of experimental MPI-based network layer, enabled with REALM_NETWORKS=mpi (make) or -DRealm_NETWORKS=mpi (cmake). Use REALM_NETWORKS=gasnet1 (or USE_GASNET=1, which still works) for the GASNet-based network layer (which works with GASNet-1 or GASNet-EX)
    • CUDA Runtime API interposer (a.k.a. "hijack") can now be disabled with USE_CUDART_HIJACK=0 (make) or -DLegion_HIJACK_CUDART=OFF (cmake). This can reduce effectivenes of task-parallelism for CUDA tasks, so use only if needed
    • More control over GPU selection via: -cuda:skipgpus N which leaves the first N GPUs available for other uses, -cuda:skipbusy which skips over busy GPUs, and -cuda:minavailmem M which skips GPUs with less than M device memory available
    • Reduction in memory usage of Realm internal data structures
  • Tools
    • There is a now a generic launcher script for running Python code with Legion that will execute an aribtrary Python program in the top-level task of a Legion program. This script mirrors the interface to CPython as closely as possible.
    • Legion Spy now supports verification and rendering of indirection copies
    • Legion Prof supports Instance layout constraints related to dimension ordering and field alignnment
    • Legion Prof contains a menu option for viewing ready state of operations

Version 19.12.0 (December 31, 2019)

01 Jan 00:32
Compare
Choose a tag to compare
  • Build
    • Both builds (Make and CMake) now generate legion_defines.h and
      realm_defines.h. By default these headers are generated in
      the source directory (Make) or build directory (CMake). This
      means that languages such as Regent and Python no longer
      require MAX_DIM to be specified explicitly
  • Regent
    • Support for CUDA 10
    • Support for field polymorphic tasks
    • Substantially improved the generality of the index launch
      optimization. Task arguments of the form p[i+k] may now be
      used, where k is a variable defined outside of the loop
    • Add flag -foverride-demand-index-launch which can be used to
      force loops to be index launched in cases where the compiler
      cannot prove the disjointness of read-write region
      arguments
    • Added reductions for complex64
    • The scripts install.py and setup_env.py now use CMake to
      build Terra by default, which should improve portability on
      most machines
    • The behavior of -fcuda 1 has changed: this flag will now issue
      an error if CUDA cannot be enabled (e.g. because the build
      does not support CUDA, or because the machine has no
      GPUs). Omitting this flag will now enable CUDA if it is
      available (and will not error if it is not available).
      The behavior of -fopenmp 1 has changed similarly.
    • The behavior of __demand(__cuda) has changed. This will now
      issue an error if a loop is not eligible for the CUDA
      transformation, regardless of whether CUDA is actually
      available on the current machine or not. The behavior of
      __demand(__openmp) has changed similarly.
    • The annotation __allow(__cuda) is now permitted, and permits
      (but does not require) tasks to be optimized with CUDA.
    • Experimental support for 2D kernel launch in the CUDA code generation
  • Python
    • Add support for copies
    • Copies and fills now support multiple fields
    • Tasks (including index launches) now support setting the mapper
      ID and tag
  • Legion
    • A major overhaul of the Legion physical analysis to use an
      approach based on bounding volume hierarchies. The change is
      not visible to users, but will likely impact performance. Most
      programs will get faster; programs that create many partitions
      frequently on the fly may get slower. The later case will be fixed
      in an upcoming release.
    • Added support for indirect copy operations such as gather and
      scatter onto existing copy launchers
  • Realm
    • Event::subscribe allows polling via Event::has_triggered to
      (eventually) succeed
    • Addition of CompletionQueue objects that allow multiple unordered
      Event triggers to be efficiently handled by a single consumer
    • Support for omp_get_level, omp_in_parallel, and
      omp_set_num_threads in tasks running on OpenMP processors
    • Support for unstructured scatter and/or gather in copies. (Handling
      structured cases as well as fills/reductions remains a work in
      progress.)
    • Removed all calls to Event::wait from inside other Realm API calls.
      Applications now must make sure that index spaces and instance
      metadata are valid before use. For details, see: #465

Version 19.09.0 (September 9, 2019)

10 Sep 18:01
Compare
Choose a tag to compare
  • Regent
    • __demand(__index_launch) has been added as an alternative to __demand(__parallel) on for loops that avoids confusion with the auto-parallelizer. __demand(__parallel) on for loops is deprecated and now issues a warning; in a future release this warning will be upgraded to an error. For details, see: #520
    • Multi-field expasion is deprecated and now issues an error. The error can be temporarily downgraded to a warning, but it is advised that users migrate codes away from this syntax as it will become a hard error in a future release. For details, see: #501
  • Legion
    • Support for a built-in collection of reduction operators including sum, product, max, and min over a variety of types for CPUs and GPUs
  • Realm
    • assorted bug, performance, and memory leak fixes
    • fills to attached HDF5 instances are orders of magnitude faster
    • support for reusing HDF5 file handles with -hdf5:openfiles option
    • control which rank opens an HDF5 file with a rank=nnn: filename prefix
  • Build System
    • Makefile-based flow attempts to detect CUDA location and GASNet conduit if they are not specified
    • Makefile-based flow defaults to building CUDA fat binaries, but can still be overridden with the GPU_ARCH setting, which now accepts SM arch numbers (e.g. "70") as well as names (e.g. "volta")