|
1 | 1 | # Examples
|
2 | 2 |
|
3 |
| -## Which cloud service should I use? |
| 3 | +## Which executor should I use? |
4 | 4 |
|
5 |
| -**Modal** is the easiest to get started with because it handles building a runtime environment for you automatically (note that it requires that you [sign up](https://modal.com/signup) for a free account). |
6 |
| -It has been tested with ~300 workers. |
| 5 | +[**Lithops**](https://lithops-cloud.github.io/) is the executor we recommend for most users, since it has had the most testing so far (~1000 workers). |
| 6 | +If your data is in Amazon S3 then use Lithops with AWS Lambda, and if it's in GCS use Lithops with Google Cloud Functions. You have to build a runtime environment as a part of the setting up process. |
7 | 7 |
|
8 |
| -**Lithops** requires slightly more work to get started since you have to build a runtime environment first. |
9 |
| -Lithops has support for many serverless services on various cloud providers, but has so far been tested on two: |
| 8 | +[**Modal**](https://modal.com/) is very easy to get started with because it handles building a runtime environment for you automatically (note that it requires that you [sign up](https://modal.com/signup) for a free account). **At the time of writing, Modal does not guarantee that functions run in any particular cloud region, so it is not currently recommended that you run large computations since excessive data transfer fees are likely.** |
10 | 9 |
|
| 10 | +[**Coiled**](https://www.coiled.io/) is also easy to get started with ([sign up](https://cloud.coiled.io/signup)). It uses [Coiled Functions](https://docs.coiled.io/user_guide/usage/functions/index.html) and has a 1-2 minute overhead to start a cluster. |
11 | 11 |
|
12 |
| -- **AWS lambda** requires building a docker container first, but has been tested with hundreds of workers. |
13 |
| -- **Google Cloud Functions** only requires building a Lithops runtime, which can be created from a pip-style `requirements.txt` without docker. Large-scale testing is ongoing. |
| 12 | +[**Google Cloud Dataflow**](https://cloud.google.com/dataflow) is relatively straightforward to get started with. It has the highest overhead for worker startup (minutes compared to seconds for Modal or Lithops), and although it has only been tested with ~20 workers, it is a mature service and therefore should be reliable for much larger computations. |
14 | 13 |
|
15 |
| -**Google Cloud Dataflow** is relatively straightforward to get started with. It has the highest overhead for worker startup (minutes compared to seconds for Modal or Lithops), and although it has only been tested with ~20 workers, it is the most mature service and therefore should be reliable for much larger computations. |
| 14 | +## Set up |
16 | 15 |
|
17 |
| -## Lithops (AWS Lambda, S3) |
| 16 | +Follow the instructions for setting up Cubed to run on your chosen cloud and executor runtime: |
18 | 17 |
|
19 |
| -See [Lithops/aws-lambda](lithops/aws-lambda/README.md) |
| 18 | +| Executor | Cloud | Set up instructions | |
| 19 | +|----------|--------|--------------------------------------------------------------| |
| 20 | +| Lithops | AWS | [lithops/aws-lambda/README.md](lithops/aws-lambda/README.md) | |
| 21 | +| | Google | [lithops/gcf/README.md](lithops/gcf/README.md) | |
| 22 | +| Modal | AWS | [modal/aws/README.md](modal/aws/README.md) | |
| 23 | +| | Google | [modal/gcp/README.md](modal/gcp/README.md) | |
| 24 | +| Coiled | AWS | [coiled/aws/README.md](coiled/aws/README.md) | |
| 25 | +| Beam | Google | [dataflow/README.md](dataflow/README.md) | |
20 | 26 |
|
21 |
| -## Lithops (Google Cloud Functions, GCS) |
| 27 | +## Examples |
22 | 28 |
|
23 |
| -See [Lithops/gcf](lithops/gcf/README.md) |
| 29 | +The `add-asarray.py` script is a small example that adds two small 4x4 arrays together, and is useful for checking that the runtime is working. |
| 30 | +Export `CUBED_CONFIG` as described in the set up instructions, then run the script. This is for Lithops on AWS: |
24 | 31 |
|
25 |
| -## Modal (AWS, S3) |
| 32 | +```shell |
| 33 | +export CUBED_CONFIG=$(pwd)/lithops/aws-lambda |
| 34 | +python add-asarray.py |
| 35 | +``` |
26 | 36 |
|
27 |
| -See [Modal/aws](modal/aws/README.md) |
| 37 | +If successful it should print a 4x4 array. |
28 | 38 |
|
29 |
| -## Apache Beam (Google Cloud Dataflow) |
| 39 | +The other examples are run in a similar way: |
30 | 40 |
|
31 |
| -See [Dataflow](dataflow/README.md) |
| 41 | +```shell |
| 42 | +export CUBED_CONFIG=... |
| 43 | +python add-random.py |
| 44 | +``` |
| 45 | + |
| 46 | +and |
| 47 | + |
| 48 | +```shell |
| 49 | +export CUBED_CONFIG=... |
| 50 | +python matmul-random.py |
| 51 | +``` |
| 52 | + |
| 53 | +These will take longer to run as they operate on more data. |
| 54 | + |
| 55 | +The last two examples use `TimelineVisualizationCallback` which produce a plot showing the timeline of events in the task lifecycle. |
| 56 | +The plots are SVG files and are written in the `history` directory in a directory with a timestamp. Open the latest one with |
| 57 | + |
| 58 | +```shell |
| 59 | +open $(ls -d history/compute-* | tail -1)/timeline.svg |
| 60 | +``` |
0 commit comments