Skip to content

Commit 3ecc820

Browse files
committed
Create a single set of example scripts that can run on any executor by specifying external config.
1 parent aedc350 commit 3ecc820

33 files changed

+174
-630
lines changed

examples/README.md

Lines changed: 45 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,60 @@
11
# Examples
22

3-
## Which cloud service should I use?
3+
## Which executor should I use?
44

5-
**Modal** is the easiest to get started with because it handles building a runtime environment for you automatically (note that it requires that you [sign up](https://modal.com/signup) for a free account).
6-
It has been tested with ~300 workers.
5+
[**Lithops**](https://lithops-cloud.github.io/) is the executor we recommend for most users, since it has had the most testing so far (~1000 workers).
6+
If your data is in Amazon S3 then use Lithops with AWS Lambda, and if it's in GCS use Lithops with Google Cloud Functions. You have to build a runtime environment as a part of the setting up process.
77

8-
**Lithops** requires slightly more work to get started since you have to build a runtime environment first.
9-
Lithops has support for many serverless services on various cloud providers, but has so far been tested on two:
8+
[**Modal**](https://modal.com/) is very easy to get started with because it handles building a runtime environment for you automatically (note that it requires that you [sign up](https://modal.com/signup) for a free account). **At the time of writing, Modal does not guarantee that functions run in any particular cloud region, so it is not currently recommended that you run large computations since excessive data transfer fees are likely.**
109

10+
[**Coiled**](https://www.coiled.io/) is also easy to get started with ([sign up](https://cloud.coiled.io/signup)). It uses [Coiled Functions](https://docs.coiled.io/user_guide/usage/functions/index.html) and has a 1-2 minute overhead to start a cluster.
1111

12-
- **AWS lambda** requires building a docker container first, but has been tested with hundreds of workers.
13-
- **Google Cloud Functions** only requires building a Lithops runtime, which can be created from a pip-style `requirements.txt` without docker. Large-scale testing is ongoing.
12+
[**Google Cloud Dataflow**](https://cloud.google.com/dataflow) is relatively straightforward to get started with. It has the highest overhead for worker startup (minutes compared to seconds for Modal or Lithops), and although it has only been tested with ~20 workers, it is a mature service and therefore should be reliable for much larger computations.
1413

15-
**Google Cloud Dataflow** is relatively straightforward to get started with. It has the highest overhead for worker startup (minutes compared to seconds for Modal or Lithops), and although it has only been tested with ~20 workers, it is the most mature service and therefore should be reliable for much larger computations.
14+
## Set up
1615

17-
## Lithops (AWS Lambda, S3)
16+
Follow the instructions for setting up Cubed to run on your chosen cloud and executor runtime:
1817

19-
See [Lithops/aws-lambda](lithops/aws-lambda/README.md)
18+
| Executor | Cloud | Set up instructions |
19+
|----------|--------|--------------------------------------------------------------|
20+
| Lithops | AWS | [lithops/aws-lambda/README.md](lithops/aws-lambda/README.md) |
21+
| | Google | [lithops/gcf/README.md](lithops/gcf/README.md) |
22+
| Modal | AWS | [modal/aws/README.md](modal/aws/README.md) |
23+
| | Google | [modal/gcp/README.md](modal/gcp/README.md) |
24+
| Coiled | AWS | [coiled/aws/README.md](coiled/aws/README.md) |
25+
| Beam | Google | [dataflow/README.md](dataflow/README.md) |
2026

21-
## Lithops (Google Cloud Functions, GCS)
27+
## Examples
2228

23-
See [Lithops/gcf](lithops/gcf/README.md)
29+
The `add-asarray.py` script is a small example that adds two small 4x4 arrays together, and is useful for checking that the runtime is working.
30+
Export `CUBED_CONFIG` as described in the set up instructions, then run the script. This is for Lithops on AWS:
2431

25-
## Modal (AWS, S3)
32+
```shell
33+
export CUBED_CONFIG=$(pwd)/lithops/aws-lambda
34+
python add-asarray.py
35+
```
2636

27-
See [Modal/aws](modal/aws/README.md)
37+
If successful it should print a 4x4 array.
2838

29-
## Apache Beam (Google Cloud Dataflow)
39+
The other examples are run in a similar way:
3040

31-
See [Dataflow](dataflow/README.md)
41+
```shell
42+
export CUBED_CONFIG=...
43+
python add-random.py
44+
```
45+
46+
and
47+
48+
```shell
49+
export CUBED_CONFIG=...
50+
python matmul-random.py
51+
```
52+
53+
These will take longer to run as they operate on more data.
54+
55+
The last two examples use `TimelineVisualizationCallback` which produce a plot showing the timeline of events in the task lifecycle.
56+
The plots are SVG files and are written in the `history` directory in a directory with a timestamp. Open the latest one with
57+
58+
```shell
59+
open $(ls -d history/compute-* | tail -1)/timeline.svg
60+
```

examples/add-asarray.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
import cubed.array_api as xp
2+
3+
if __name__ == "__main__":
4+
a = xp.asarray(
5+
[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]],
6+
chunks=(2, 2),
7+
)
8+
b = xp.asarray(
9+
[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]],
10+
chunks=(2, 2),
11+
)
12+
c = xp.add(a, b)
13+
res = c.compute()
14+
print(res)

examples/add-random.py

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
import logging
2+
3+
import cubed
4+
import cubed.array_api as xp
5+
import cubed.random
6+
from cubed.extensions.history import HistoryCallback
7+
from cubed.extensions.rich import RichProgressBar
8+
from cubed.extensions.timeline import TimelineVisualizationCallback
9+
10+
# suppress harmless connection pool warnings
11+
logging.getLogger("urllib3.connectionpool").setLevel(logging.ERROR)
12+
13+
if __name__ == "__main__":
14+
# 200MB chunks
15+
a = cubed.random.random((50000, 50000), chunks=(5000, 5000))
16+
b = cubed.random.random((50000, 50000), chunks=(5000, 5000))
17+
c = xp.add(a, b)
18+
19+
progress = RichProgressBar()
20+
hist = HistoryCallback()
21+
timeline_viz = TimelineVisualizationCallback()
22+
# use store=None to write to temporary zarr
23+
cubed.to_zarr(
24+
c,
25+
store=None,
26+
callbacks=[progress, hist, timeline_viz],
27+
)

examples/coiled/aws/README.md

Lines changed: 5 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -12,23 +12,17 @@
1212
3. Install a Python environment with the coiled package in it by running the following from this directory:
1313

1414
```shell
15-
conda create -n cubed-coiled-examples python=3.9 -y
16-
conda activate cubed-coiled-examples
15+
conda create --name cubed-coiled-aws-examples -y python=3.10
16+
conda activate cubed-coiled-aws-examples
1717
pip install 'cubed[coiled]'
1818
```
1919

2020
## Examples
2121

22-
Start with the simplest example:
22+
Before running the examples, first change to the top-level examples directory (`cd ../..`) and type
2323

2424
```shell
25-
python coiled-add-asarray.py "s3://cubed-$USER-temp"
25+
export CUBED_CONFIG=$(pwd)/coiled/aws
2626
```
2727

28-
If successful it should print a 4x4 matrix.
29-
30-
Run the other example in a similar way
31-
32-
```shell
33-
python coiled-add-random.py "s3://cubed-modal-$USER-temp"
34-
```
28+
Then you can run the examples described [there](../../README.md).

examples/coiled/aws/coiled-add-asarray.py

Lines changed: 0 additions & 29 deletions
This file was deleted.

examples/coiled/aws/coiled-add-random.py

Lines changed: 0 additions & 36 deletions
This file was deleted.

examples/coiled/aws/cubed.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
spec:
2+
work_dir: "s3://cubed-$USER-temp"
3+
allowed_mem: "2GB"
4+
executor_name: "coiled"
5+
executor_options:
6+
minimum_workers: 10 # cluster will adapt to this minimum size
7+
memory: ["2 GiB", "8 GiB"] # memory range, lower value must be at least allowed_mem
8+
spot_policy: "spot_with_fallback" # recommended
9+
account: null # use your default account (or change to use a specific account)
10+
keepalive: "5 minutes" # change this to keep clusters alive longer

examples/lithops/aws-lambda/README.md

Lines changed: 3 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -35,35 +35,13 @@ ulimit -n 1024
3535

3636
## Running
3737

38-
Start with the simplest example:
38+
Before running the examples, first change to the top-level examples directory (`cd ../..`) and type
3939

4040
```shell
41-
python lithops-add-asarray.py "s3://cubed-$USER-temp" cubed-runtime
41+
export CUBED_CONFIG=$(pwd)/lithops/aws-lambda
4242
```
4343

44-
If successful it should print a 4x4 matrix.
45-
46-
Run the other examples in a similar way
47-
48-
```shell
49-
python lithops-add-random.py "s3://cubed-$USER-temp" cubed-runtime
50-
```
51-
52-
and
53-
54-
```shell
55-
python lithops-matmul-random.py "s3://cubed-$USER-temp" cubed-runtime
56-
```
57-
58-
These will take longer to run as they operate on more data.
59-
60-
61-
The last two examples use `TimelineVisualizationCallback` which produce a plot showing the timeline of events in the task lifecycle.
62-
The plots are `png` files and are written in the `history` directory in a directory with a timestamp. Open the latest one with
63-
64-
```shell
65-
open $(ls -d history/compute-* | tail -1)/timeline.png
66-
```
44+
Then you can run the examples described [there](../../README.md).
6745

6846
## Cleaning up
6947

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
spec:
2+
work_dir: "s3://cubed-$USER-temp"
3+
allowed_mem: "2GB"
4+
executor_name: "lithops"
5+
executor_options:
6+
runtime: "cubed-runtime-dev"
7+
runtime_memory: 2000

examples/lithops/aws-lambda/lithops-add-asarray.py

Lines changed: 0 additions & 24 deletions
This file was deleted.

examples/lithops/aws-lambda/lithops-add-random.py

Lines changed: 0 additions & 42 deletions
This file was deleted.

examples/lithops/aws-lambda/lithops-matmul-random.py

Lines changed: 0 additions & 44 deletions
This file was deleted.
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
cubed
22
lithops[aws]
33
s3fs
4-
tqdm
4+
rich

0 commit comments

Comments
 (0)