Skip to content

Commit

Permalink
Legacy Readme
Browse files Browse the repository at this point in the history
  • Loading branch information
fknorr committed Dec 9, 2024
1 parent ecd9469 commit 1e7a85f
Showing 1 changed file with 8 additions and 140 deletions.
148 changes: 8 additions & 140 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,143 +1,11 @@
<p align="center">
<img src="docs/celerity_logo.png" alt="Celerity Logo">
</p>
# Celerity Legacy Branch: CCGrid / Multi-GPU

# Celerity Runtime - [![MIT License](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/celerity/celerity-runtime/blob/master/LICENSE) [![Semver 2.0](https://img.shields.io/badge/semver-2.0.0-blue)](https://semver.org/spec/v2.0.0.html) [![PRs # Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/celerity/celerity-runtime/blob/master/CONTRIBUTING.md)
This is the version of Celerity used in benchmarking for the CCGrid 2023 paper
"An asynchronous dataflow-driven execution model for distributed accelerator computing".

The Celerity distributed runtime and API aims to bring the power and ease of
use of [SYCL](https://sycl.tech) to distributed memory clusters.
It is based on the old buffer manager runtime, with experimental multi-GPU support and
CUDA specific hacks for asynchronous copies using CUDA streams. Reductions and many tests
are broken in this version.

> If you want a step-by-step introduction on how to set up dependencies and
> implement your first Celerity application, check out the
> [tutorial](docs/tutorial.md)!
## Overview

Programming modern accelerators is already challenging in and of itself.
Combine it with the distributed memory semantics of a cluster, and the
complexity can become so daunting that many leave it unattempted. Celerity
wants to relieve you of some of this burden, allowing you to target
accelerator clusters with programs that look like they are written for a
single device.

### High-level API based on SYCL

Celerity makes it a priority to stay as close to the SYCL API as possible. If
you have an existing SYCL application, you should be able to migrate it to
Celerity without much hassle. If you know SYCL already, this will probably
look very familiar to you:

```cpp
celerity::buffer<float> buf{celerity::range<1>{1024}};
queue.submit([=](celerity::handler& cgh) {
celerity::accessor acc{buf, cgh,
celerity::access::one_to_one{}, // 1
celerity::write_only, celerity::no_init};
cgh.parallel_for<class MyKernel>(
celerity::range<1>{1024}, // 2
[=](celerity::item<1> item) { // 3
acc[item] = sycl::sin(item[0] / 1024.f); // 4
});
});
```

1. Provide a [range-mapper](docs/range-mappers.md) to tell Celerity which
parts of the buffer will be accessed by the kernel.

2. Submit a kernel to be executed by 1024 parallel _work items_. This kernel
may be split across any number of nodes.

3. Kernels can be expressed as C++11 lambda functions, just like in SYCL. In
fact, no changes to your existing kernels are required.

4. Access your buffers as if they reside on a single device -- even though
they might be scattered throughout the cluster.

### Run it like any other MPI application

The kernel shown above can be run on a single GPU, just like in SYCL, or on a
whole cluster -- without having to change anything about the program itself.

For example, if we were to run it on two GPUs using `mpirun -n 2 ./my_example`,
the first GPU might compute the range `0-512` of the kernel, while the second
one computes `512-1024`. However, as the user, you don't have to care how
exactly your computation is being split up.

To see how you can use the result of your computation, look at some of our
fully-fledged [examples](examples), or follow the
[tutorial](docs/tutorial.md)!

## Building Celerity

Celerity uses CMake as its build system. The build process itself is rather
simple, however you have to make sure that you have a few dependencies
installed first.

### Dependencies

- A supported SYCL implementation, either
<<<<<<< HEAD
- [hipSYCL](https://github.com/illuhad/hipsycl),
- [ComputeCpp](https://www.codeplay.com/products/computesuite/computecpp), or
- [DPC++](https://github.com/intel/llvm)
=======
- [AdaptiveCpp](https://github.com/AdaptiveCpp/AdaptiveCpp),
- [DPC++](https://github.com/intel/llvm), or
- [SimSYCL](https://github.com/celerity/SimSYCL)
>>>>>>> dc9ac232 (hipSYCL is now AdaptiveCpp)
- A MPI 2 implementation (tested with OpenMPI 4.0, MPICH 3.3 should work as well)
- [CMake](https://www.cmake.org) (3.13 or newer)
- A C++17 compiler

See the [platform support guide](docs/platform-support.md) on which library and OS versions are supported and
automatically tested.

Building can be as simple as calling `cmake && make`, depending on your setup
you might however also have to provide some library paths etc.
See our [installation guide](docs/installation.md) for more information.

The runtime comes with several [examples](examples) that can be used as a starting
point for developing your own Celerity application. All examples will also be built
automatically in-tree when the `CELERITY_BUILD_EXAMPLES` CMake option is set
(true by default).

## Using Celerity as a Library

Simply run `make install` (or equivalent, depending on build system) to copy
all relevant header files and libraries to the `CMAKE_INSTALL_PREFIX`. This
includes a CMake [package configuration file](https://cmake.org/cmake/help/latest/manual/cmake-packages.7.html#package-configuration-file)
which is placed inside the `lib/cmake/Celerity` directory. You can then use
`find_package(Celerity CONFIG)` to include Celerity into your CMake project.
Once included, you can use the `add_celerity_to_target(TARGET target SOURCES source1 source2...)`
function to set up the required dependencies for a target (no need to link manually).

## Running a Celerity Application

Celerity is built on top of MPI, which means a Celerity application can be
executed like any other MPI application (i.e., using `mpirun` or equivalent).
There are several environment variables that you can use to influence
Celerity's runtime behavior:

### Environment Variables

- `CELERITY_LOG_LEVEL` controls the logging output level. One of `trace`, `debug`,
`info`, `warn`, `err`, `critical`, or `off`.
- `CELERITY_DEVICES` can be used to assign different compute devices to Celerity worker
nodes on a single host. The syntax is as follows:
`CELERITY_DEVICES="<platform_id> <first device_id> <second device_id> ... <nth device_id>"`.
Note that this should normally not be required, as Celerity will attempt to
automatically assign a unique device to each worker on a host.
- `CELERITY_PROFILE_KERNEL` controls whether SYCL queue profiling information
should be queried (currently not supported when using hipSYCL).
- `CELERITY_GRAPH_PRINT_MAX_VERTS` sets the maximum number of vertices the
task/command graphs can have above which their GraphViz output will be omitted.
- `CELERITY_DRY_RUN_NODES` takes a number and simulates a run with that many nodes
without actually executing the commands.

## Disclaimer

Celerity is a research project first and foremost, and is still in
early development. While it does work for certain applications, it probably
does not fully support your use case just yet. We'd however love for you to
give it a try and tell us about how you could imagine using Celerity for your
projects in the future!
Updates to dependencies and code interfacing with SYCL has been updated to allow building
with recent compilers and SYCL implementations (= AdaptiveCpp).

0 comments on commit 1e7a85f

Please sign in to comment.