diff --git a/examples/README.md b/examples/README.md index 3256b7eba..57e9a7a4b 100644 --- a/examples/README.md +++ b/examples/README.md @@ -1,31 +1,60 @@ # Examples -## Which cloud service should I use? +## Which executor should I use? -**Modal** is the easiest to get started with because it handles building a runtime environment for you automatically (note that it requires that you [sign up](https://modal.com/signup) for a free account). -It has been tested with ~300 workers. +[**Lithops**](https://lithops-cloud.github.io/) is the executor we recommend for most users, since it has had the most testing so far (~1000 workers). +If your data is in Amazon S3 then use Lithops with AWS Lambda, and if it's in GCS use Lithops with Google Cloud Functions. You have to build a runtime environment as a part of the setting up process. -**Lithops** requires slightly more work to get started since you have to build a runtime environment first. -Lithops has support for many serverless services on various cloud providers, but has so far been tested on two: +[**Modal**](https://modal.com/) is very easy to get started with because it handles building a runtime environment for you automatically (note that it requires that you [sign up](https://modal.com/signup) for a free account). **At the time of writing, Modal does not guarantee that functions run in any particular cloud region, so it is not currently recommended that you run large computations since excessive data transfer fees are likely.** +[**Coiled**](https://www.coiled.io/) is also easy to get started with ([sign up](https://cloud.coiled.io/signup)). It uses [Coiled Functions](https://docs.coiled.io/user_guide/usage/functions/index.html) and has a 1-2 minute overhead to start a cluster. -- **AWS lambda** requires building a docker container first, but has been tested with hundreds of workers. -- **Google Cloud Functions** only requires building a Lithops runtime, which can be created from a pip-style `requirements.txt` without docker. Large-scale testing is ongoing. +[**Google Cloud Dataflow**](https://cloud.google.com/dataflow) is relatively straightforward to get started with. It has the highest overhead for worker startup (minutes compared to seconds for Modal or Lithops), and although it has only been tested with ~20 workers, it is a mature service and therefore should be reliable for much larger computations. -**Google Cloud Dataflow** is relatively straightforward to get started with. It has the highest overhead for worker startup (minutes compared to seconds for Modal or Lithops), and although it has only been tested with ~20 workers, it is the most mature service and therefore should be reliable for much larger computations. +## Set up -## Lithops (AWS Lambda, S3) +Follow the instructions for setting up Cubed to run on your chosen cloud and executor runtime: -See [Lithops/aws-lambda](lithops/aws-lambda/README.md) +| Executor | Cloud | Set up instructions | +|----------|--------|--------------------------------------------------------------| +| Lithops | AWS | [lithops/aws-lambda/README.md](lithops/aws-lambda/README.md) | +| | Google | [lithops/gcf/README.md](lithops/gcf/README.md) | +| Modal | AWS | [modal/aws/README.md](modal/aws/README.md) | +| | Google | [modal/gcp/README.md](modal/gcp/README.md) | +| Coiled | AWS | [coiled/aws/README.md](coiled/aws/README.md) | +| Beam | Google | [dataflow/README.md](dataflow/README.md) | -## Lithops (Google Cloud Functions, GCS) +## Examples -See [Lithops/gcf](lithops/gcf/README.md) +The `add-asarray.py` script is a small example that adds two small 4x4 arrays together, and is useful for checking that the runtime is working. +Export `CUBED_CONFIG` as described in the set up instructions, then run the script. This is for Lithops on AWS: -## Modal (AWS, S3) +```shell +export CUBED_CONFIG=$(pwd)/lithops/aws-lambda +python add-asarray.py +``` -See [Modal/aws](modal/aws/README.md) +If successful it should print a 4x4 array. -## Apache Beam (Google Cloud Dataflow) +The other examples are run in a similar way: -See [Dataflow](dataflow/README.md) +```shell +export CUBED_CONFIG=... +python add-random.py +``` + +and + +```shell +export CUBED_CONFIG=... +python matmul-random.py +``` + +These will take longer to run as they operate on more data. + +The last two examples use `TimelineVisualizationCallback` which produce a plot showing the timeline of events in the task lifecycle. +The plots are SVG files and are written in the `history` directory in a directory with a timestamp. Open the latest one with + +```shell +open $(ls -d history/compute-* | tail -1)/timeline.svg +``` diff --git a/examples/add-asarray.py b/examples/add-asarray.py new file mode 100644 index 000000000..adc2f9c65 --- /dev/null +++ b/examples/add-asarray.py @@ -0,0 +1,14 @@ +import cubed.array_api as xp + +if __name__ == "__main__": + a = xp.asarray( + [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]], + chunks=(2, 2), + ) + b = xp.asarray( + [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]], + chunks=(2, 2), + ) + c = xp.add(a, b) + res = c.compute() + print(res) diff --git a/examples/add-random.py b/examples/add-random.py new file mode 100644 index 000000000..0cde4ded4 --- /dev/null +++ b/examples/add-random.py @@ -0,0 +1,27 @@ +import logging + +import cubed +import cubed.array_api as xp +import cubed.random +from cubed.extensions.history import HistoryCallback +from cubed.extensions.rich import RichProgressBar +from cubed.extensions.timeline import TimelineVisualizationCallback + +# suppress harmless connection pool warnings +logging.getLogger("urllib3.connectionpool").setLevel(logging.ERROR) + +if __name__ == "__main__": + # 200MB chunks + a = cubed.random.random((50000, 50000), chunks=(5000, 5000)) + b = cubed.random.random((50000, 50000), chunks=(5000, 5000)) + c = xp.add(a, b) + + progress = RichProgressBar() + hist = HistoryCallback() + timeline_viz = TimelineVisualizationCallback() + # use store=None to write to temporary zarr + cubed.to_zarr( + c, + store=None, + callbacks=[progress, hist, timeline_viz], + ) diff --git a/examples/coiled/aws/README.md b/examples/coiled/aws/README.md index 4427cc829..44f0461a2 100644 --- a/examples/coiled/aws/README.md +++ b/examples/coiled/aws/README.md @@ -12,23 +12,17 @@ 3. Install a Python environment with the coiled package in it by running the following from this directory: ```shell -conda create -n cubed-coiled-examples python=3.9 -y -conda activate cubed-coiled-examples +conda create --name cubed-coiled-aws-examples -y python=3.10 +conda activate cubed-coiled-aws-examples pip install 'cubed[coiled]' ``` ## Examples -Start with the simplest example: +Before running the examples, first change to the top-level examples directory (`cd ../..`) and type ```shell -python coiled-add-asarray.py "s3://cubed-$USER-temp" +export CUBED_CONFIG=$(pwd)/coiled/aws ``` -If successful it should print a 4x4 matrix. - -Run the other example in a similar way - -```shell -python coiled-add-random.py "s3://cubed-modal-$USER-temp" -``` +Then you can run the examples described [there](../../README.md). diff --git a/examples/coiled/aws/coiled-add-asarray.py b/examples/coiled/aws/coiled-add-asarray.py deleted file mode 100644 index 4d65125bb..000000000 --- a/examples/coiled/aws/coiled-add-asarray.py +++ /dev/null @@ -1,29 +0,0 @@ -import sys - -import cubed -import cubed.array_api as xp -from cubed.runtime.executors.coiled import CoiledFunctionsDagExecutor - -if __name__ == "__main__": - tmp_path = sys.argv[1] - spec = cubed.Spec(tmp_path, allowed_mem=100000) - executor = CoiledFunctionsDagExecutor() - a = xp.asarray( - [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]], - chunks=(2, 2), - spec=spec, - ) - b = xp.asarray( - [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]], - chunks=(2, 2), - spec=spec, - ) - c = xp.add(a, b) - res = c.compute( - executor=executor, - memory=["1 GiB", "8 GiB"], # memory range, lower value must be at least allowed_mem - spot_policy="spot_with_fallback", # recommended - account=None, # use your default account (or change to use a specific account) - keepalive="30 seconds", # change this to keep clusters alive longer - ) - print(res) diff --git a/examples/coiled/aws/coiled-add-random.py b/examples/coiled/aws/coiled-add-random.py deleted file mode 100644 index adcd8ae1c..000000000 --- a/examples/coiled/aws/coiled-add-random.py +++ /dev/null @@ -1,36 +0,0 @@ -import sys - -import cubed -import cubed.array_api as xp -import cubed.random -from cubed.extensions.history import HistoryCallback -from cubed.extensions.timeline import TimelineVisualizationCallback -from cubed.extensions.tqdm import TqdmProgressBar, std_out_err_redirect_tqdm -from cubed.runtime.executors.coiled import CoiledFunctionsDagExecutor - -if __name__ == "__main__": - tmp_path = sys.argv[1] - spec = cubed.Spec(tmp_path, allowed_mem=2_000_000_000) - executor = CoiledFunctionsDagExecutor() - a = cubed.random.random( - (50000, 50000), chunks=(5000, 5000), spec=spec - ) # 200MB chunks - b = cubed.random.random( - (50000, 50000), chunks=(5000, 5000), spec=spec - ) # 200MB chunks - c = xp.add(a, b) - with std_out_err_redirect_tqdm() as orig_stdout: - progress = TqdmProgressBar(file=orig_stdout, dynamic_ncols=True) - hist = HistoryCallback() - timeline_viz = TimelineVisualizationCallback() - # use store=None to write to temporary zarr - cubed.to_zarr( - c, - store=None, - executor=executor, - callbacks=[progress, hist, timeline_viz], - memory=["2 GiB", "8 GiB"], # memory range - spot_policy="spot_with_fallback", # recommended - account=None, # use your default account (or change to use a specific account) - keepalive="30 seconds", # change this to keep clusters alive longer - ) diff --git a/examples/coiled/aws/cubed.yaml b/examples/coiled/aws/cubed.yaml new file mode 100644 index 000000000..148c98f0d --- /dev/null +++ b/examples/coiled/aws/cubed.yaml @@ -0,0 +1,10 @@ +spec: + work_dir: "s3://cubed-$USER-temp" + allowed_mem: "2GB" + executor_name: "coiled" + executor_options: + minimum_workers: 10 # cluster will adapt to this minimum size + memory: ["2 GiB", "8 GiB"] # memory range, lower value must be at least allowed_mem + spot_policy: "spot_with_fallback" # recommended + account: null # use your default account (or change to use a specific account) + keepalive: "5 minutes" # change this to keep clusters alive longer diff --git a/examples/lithops/aws-lambda/README.md b/examples/lithops/aws-lambda/README.md index 8cb928bf7..e4a8b3465 100644 --- a/examples/lithops/aws-lambda/README.md +++ b/examples/lithops/aws-lambda/README.md @@ -35,35 +35,13 @@ ulimit -n 1024 ## Running -Start with the simplest example: +Before running the examples, first change to the top-level examples directory (`cd ../..`) and type ```shell -python lithops-add-asarray.py "s3://cubed-$USER-temp" cubed-runtime +export CUBED_CONFIG=$(pwd)/lithops/aws-lambda ``` -If successful it should print a 4x4 matrix. - -Run the other examples in a similar way - -```shell -python lithops-add-random.py "s3://cubed-$USER-temp" cubed-runtime -``` - -and - -```shell -python lithops-matmul-random.py "s3://cubed-$USER-temp" cubed-runtime -``` - -These will take longer to run as they operate on more data. - - -The last two examples use `TimelineVisualizationCallback` which produce a plot showing the timeline of events in the task lifecycle. -The plots are `png` files and are written in the `history` directory in a directory with a timestamp. Open the latest one with - -```shell -open $(ls -d history/compute-* | tail -1)/timeline.png -``` +Then you can run the examples described [there](../../README.md). ## Cleaning up diff --git a/examples/lithops/aws-lambda/cubed.yaml b/examples/lithops/aws-lambda/cubed.yaml new file mode 100644 index 000000000..741e8b822 --- /dev/null +++ b/examples/lithops/aws-lambda/cubed.yaml @@ -0,0 +1,7 @@ +spec: + work_dir: "s3://cubed-$USER-temp" + allowed_mem: "2GB" + executor_name: "lithops" + executor_options: + runtime: "cubed-runtime-dev" + runtime_memory: 2000 diff --git a/examples/lithops/aws-lambda/lithops-add-asarray.py b/examples/lithops/aws-lambda/lithops-add-asarray.py deleted file mode 100644 index 4bdcc75c5..000000000 --- a/examples/lithops/aws-lambda/lithops-add-asarray.py +++ /dev/null @@ -1,24 +0,0 @@ -import sys - -import cubed -import cubed.array_api as xp -from cubed.runtime.executors.lithops import LithopsDagExecutor - -if __name__ == "__main__": - tmp_path = sys.argv[1] - runtime = sys.argv[2] - spec = cubed.Spec(tmp_path, allowed_mem=100000) - executor = LithopsDagExecutor() - a = xp.asarray( - [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]], - chunks=(2, 2), - spec=spec, - ) - b = xp.asarray( - [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]], - chunks=(2, 2), - spec=spec, - ) - c = xp.add(a, b) - res = c.compute(executor=executor, runtime=runtime, runtime_memory=2000) - print(res) diff --git a/examples/lithops/aws-lambda/lithops-add-random.py b/examples/lithops/aws-lambda/lithops-add-random.py deleted file mode 100644 index 09475ad4d..000000000 --- a/examples/lithops/aws-lambda/lithops-add-random.py +++ /dev/null @@ -1,42 +0,0 @@ -import logging -import sys - -from tqdm.contrib.logging import logging_redirect_tqdm - -import cubed -import cubed.array_api as xp -import cubed.random -from cubed.extensions.history import HistoryCallback -from cubed.extensions.timeline import TimelineVisualizationCallback -from cubed.extensions.tqdm import TqdmProgressBar -from cubed.runtime.executors.lithops import LithopsDagExecutor - -# suppress harmless connection pool warnings -logging.getLogger("urllib3.connectionpool").setLevel(logging.ERROR) - -if __name__ == "__main__": - tmp_path = sys.argv[1] - runtime = sys.argv[2] - spec = cubed.Spec(tmp_path, allowed_mem=2_000_000_000) - executor = LithopsDagExecutor() - - a = cubed.random.random( - (50000, 50000), chunks=(5000, 5000), spec=spec - ) # 200MB chunks - b = cubed.random.random( - (50000, 50000), chunks=(5000, 5000), spec=spec - ) # 200MB chunks - c = xp.add(a, b) - with logging_redirect_tqdm(): - progress = TqdmProgressBar() - hist = HistoryCallback() - timeline_viz = TimelineVisualizationCallback() - # use store=None to write to temporary zarr - cubed.to_zarr( - c, - store=None, - executor=executor, - callbacks=[progress, hist, timeline_viz], - runtime=runtime, - runtime_memory=2000, - ) diff --git a/examples/lithops/aws-lambda/lithops-matmul-random.py b/examples/lithops/aws-lambda/lithops-matmul-random.py deleted file mode 100644 index 03b5bc97a..000000000 --- a/examples/lithops/aws-lambda/lithops-matmul-random.py +++ /dev/null @@ -1,44 +0,0 @@ -import logging -import sys - -from tqdm.contrib.logging import logging_redirect_tqdm - -import cubed -import cubed.array_api as xp -import cubed.random -from cubed.extensions.history import HistoryCallback -from cubed.extensions.timeline import TimelineVisualizationCallback -from cubed.extensions.tqdm import TqdmProgressBar -from cubed.runtime.executors.lithops import LithopsDagExecutor - -# suppress harmless connection pool warnings -logging.getLogger("urllib3.connectionpool").setLevel(logging.ERROR) - -if __name__ == "__main__": - tmp_path = sys.argv[1] - runtime = sys.argv[2] - spec = cubed.Spec(tmp_path, allowed_mem=2_000_000_000) - executor = LithopsDagExecutor() - - a = cubed.random.random( - (50000, 50000), chunks=(5000, 5000), spec=spec - ) # 200MB chunks - b = cubed.random.random( - (50000, 50000), chunks=(5000, 5000), spec=spec - ) # 200MB chunks - c = xp.astype(a, xp.float32) - d = xp.astype(b, xp.float32) - e = xp.matmul(c, d) - with logging_redirect_tqdm(): - progress = TqdmProgressBar() - hist = HistoryCallback() - timeline_viz = TimelineVisualizationCallback() - # use store=None to write to temporary zarr - cubed.to_zarr( - e, - store=None, - executor=executor, - callbacks=[progress, hist, timeline_viz], - runtime=runtime, - runtime_memory=2000, - ) diff --git a/examples/lithops/aws-lambda/requirements.txt b/examples/lithops/aws-lambda/requirements.txt index f06c01092..72c62d92f 100644 --- a/examples/lithops/aws-lambda/requirements.txt +++ b/examples/lithops/aws-lambda/requirements.txt @@ -1,4 +1,4 @@ cubed lithops[aws] s3fs -tqdm +rich diff --git a/examples/lithops/gcf/README.md b/examples/lithops/gcf/README.md index be175a870..1ffd8910d 100644 --- a/examples/lithops/gcf/README.md +++ b/examples/lithops/gcf/README.md @@ -35,35 +35,13 @@ ulimit -n 1024 ## Running -Start with the simplest example: +Before running the examples, first change to the top-level examples directory (`cd ../..`) and type ```shell -python lithops-add-asarray.py "gs://cubed-$USER-temp" cubed-runtime +export CUBED_CONFIG=$(pwd)/lithops/gcf ``` -If successful it should print a 4x4 matrix. - -Run the other examples in a similar way - -```shell -python lithops-add-random.py "gs://cubed-$USER-temp" cubed-runtime -``` - -and - -```shell -python lithops-matmul-random.py "gs://cubed-$USER-temp" cubed-runtime -``` - -These will take longer to run as they operate on more data. - - -The last two examples use `TimelineVisualizationCallback` which produce a plot showing the timeline of events in the task lifecycle. -The plots are `png` files and are written in the `history` directory in a directory with a timestamp. Open the latest one with - -```shell -open $(ls -d history/compute-* | tail -1)/timeline.png -``` +Then you can run the examples described [there](../../README.md). ## Cleaning up diff --git a/examples/lithops/gcf/cubed.yaml b/examples/lithops/gcf/cubed.yaml new file mode 100644 index 000000000..3df3be1ae --- /dev/null +++ b/examples/lithops/gcf/cubed.yaml @@ -0,0 +1,7 @@ +spec: + work_dir: "gs://cubed-$USER-temp" + allowed_mem: "2GB" + executor_name: "lithops" + executor_options: + runtime: "cubed-runtime-dev" + runtime_memory: 2048 diff --git a/examples/lithops/gcf/lithops-add-asarray.py b/examples/lithops/gcf/lithops-add-asarray.py deleted file mode 100644 index 10da78f4d..000000000 --- a/examples/lithops/gcf/lithops-add-asarray.py +++ /dev/null @@ -1,28 +0,0 @@ -import sys - -import cubed -import cubed.array_api as xp -from cubed.runtime.executors.lithops import LithopsDagExecutor - -if __name__ == "__main__": - tmp_path = sys.argv[1] - runtime = sys.argv[2] - spec = cubed.Spec(tmp_path, allowed_mem=100000) - executor = LithopsDagExecutor() - a = xp.asarray( - [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]], - chunks=(2, 2), - spec=spec, - ) - b = xp.asarray( - [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]], - chunks=(2, 2), - spec=spec, - ) - c = xp.add(a, b) - res = c.compute( - executor=executor, - runtime=runtime, - runtime_memory=2048, # Note that Lithops/Google Cloud Functions only accepts powers of 2 for this argument. - ) - print(res) diff --git a/examples/lithops/gcf/lithops-add-random.py b/examples/lithops/gcf/lithops-add-random.py deleted file mode 100644 index 478bc1886..000000000 --- a/examples/lithops/gcf/lithops-add-random.py +++ /dev/null @@ -1,42 +0,0 @@ -import logging -import sys - -from tqdm.contrib.logging import logging_redirect_tqdm - -import cubed -import cubed.array_api as xp -import cubed.random -from cubed.extensions.history import HistoryCallback -from cubed.extensions.timeline import TimelineVisualizationCallback -from cubed.extensions.tqdm import TqdmProgressBar -from cubed.runtime.executors.lithops import LithopsDagExecutor - -# suppress harmless connection pool warnings -logging.getLogger("urllib3.connectionpool").setLevel(logging.ERROR) - -if __name__ == "__main__": - tmp_path = sys.argv[1] - runtime = sys.argv[2] - spec = cubed.Spec(tmp_path, allowed_mem=2_000_000_000) - executor = LithopsDagExecutor() - - a = cubed.random.random( - (50000, 50000), chunks=(5000, 5000), spec=spec - ) # 200MB chunks - b = cubed.random.random( - (50000, 50000), chunks=(5000, 5000), spec=spec - ) # 200MB chunks - c = xp.add(a, b) - with logging_redirect_tqdm(): - progress = TqdmProgressBar() - hist = HistoryCallback() - timeline_viz = TimelineVisualizationCallback() - # use store=None to write to temporary zarr - cubed.to_zarr( - c, - store=None, - executor=executor, - callbacks=[progress, hist, timeline_viz], - runtime=runtime, - runtime_memory=2048, - ) diff --git a/examples/lithops/gcf/lithops-matmul-ones.py b/examples/lithops/gcf/lithops-matmul-ones.py deleted file mode 100644 index b2a7e91b9..000000000 --- a/examples/lithops/gcf/lithops-matmul-ones.py +++ /dev/null @@ -1,38 +0,0 @@ -import logging -import sys - -from tqdm.contrib.logging import logging_redirect_tqdm - -import cubed -import cubed.array_api as xp -import cubed.random -from cubed.extensions.history import HistoryCallback -from cubed.extensions.timeline import TimelineVisualizationCallback -from cubed.extensions.tqdm import TqdmProgressBar -from cubed.runtime.executors.lithops import LithopsDagExecutor - -# suppress harmless connection pool warnings -logging.getLogger("urllib3.connectionpool").setLevel(logging.ERROR) - -if __name__ == "__main__": - tmp_path = sys.argv[1] - runtime = sys.argv[2] - spec = cubed.Spec(tmp_path, allowed_mem="2GB") - executor = LithopsDagExecutor() - - # Note we use default float dtype, since np.matmul is not optimized for ints - a = xp.ones((50000, 50000), chunks=(5000, 5000), spec=spec) - b = xp.ones((50000, 50000), chunks=(5000, 5000), spec=spec) - c = xp.matmul(a, b) - d = xp.all(c == 50000) - with logging_redirect_tqdm(): - progress = TqdmProgressBar() - hist = HistoryCallback() - timeline_viz = TimelineVisualizationCallback() - res = d.compute( - executor=executor, - callbacks=[progress, hist, timeline_viz], - runtime=runtime, - runtime_memory=2048, # Note that Lithops/Google Cloud Functions only accepts powers of 2 for this argument. - ) - assert res, "Validation failed" diff --git a/examples/lithops/gcf/lithops-matmul-random.py b/examples/lithops/gcf/lithops-matmul-random.py deleted file mode 100644 index 8f7c02ba9..000000000 --- a/examples/lithops/gcf/lithops-matmul-random.py +++ /dev/null @@ -1,44 +0,0 @@ -import logging -import sys - -from tqdm.contrib.logging import logging_redirect_tqdm - -import cubed -import cubed.array_api as xp -import cubed.random -from cubed.extensions.history import HistoryCallback -from cubed.extensions.timeline import TimelineVisualizationCallback -from cubed.extensions.tqdm import TqdmProgressBar -from cubed.runtime.executors.lithops import LithopsDagExecutor - -# suppress harmless connection pool warnings -logging.getLogger("urllib3.connectionpool").setLevel(logging.ERROR) - -if __name__ == "__main__": - tmp_path = sys.argv[1] - runtime = sys.argv[2] - spec = cubed.Spec(tmp_path, allowed_mem=2_000_000_000) - executor = LithopsDagExecutor() - - a = cubed.random.random( - (50000, 50000), chunks=(5000, 5000), spec=spec - ) # 200MB chunks - b = cubed.random.random( - (50000, 50000), chunks=(5000, 5000), spec=spec - ) # 200MB chunks - c = xp.astype(a, xp.float32) - d = xp.astype(b, xp.float32) - e = xp.matmul(c, d) - with logging_redirect_tqdm(): - progress = TqdmProgressBar() - hist = HistoryCallback() - timeline_viz = TimelineVisualizationCallback() - # use store=None to write to temporary zarr - cubed.to_zarr( - e, - store=None, - executor=executor, - callbacks=[progress, hist, timeline_viz], - runtime=runtime, - runtime_memory=2048, - ) diff --git a/examples/lithops/gcf/requirements.txt b/examples/lithops/gcf/requirements.txt index 10b767ef3..eca236a50 100644 --- a/examples/lithops/gcf/requirements.txt +++ b/examples/lithops/gcf/requirements.txt @@ -1,4 +1,4 @@ cubed lithops[gcp] gcsfs -tqdm +rich diff --git a/examples/matmul-random.py b/examples/matmul-random.py new file mode 100644 index 000000000..2e68cab68 --- /dev/null +++ b/examples/matmul-random.py @@ -0,0 +1,29 @@ +import logging + +import cubed +import cubed.array_api as xp +import cubed.random +from cubed.extensions.history import HistoryCallback +from cubed.extensions.rich import RichProgressBar +from cubed.extensions.timeline import TimelineVisualizationCallback + +# suppress harmless connection pool warnings +logging.getLogger("urllib3.connectionpool").setLevel(logging.ERROR) + +if __name__ == "__main__": + # 200MB chunks + a = cubed.random.random((50000, 50000), chunks=(5000, 5000)) + b = cubed.random.random((50000, 50000), chunks=(5000, 5000)) + c = xp.astype(a, xp.float32) + d = xp.astype(b, xp.float32) + e = xp.matmul(c, d) + + progress = RichProgressBar() + hist = HistoryCallback() + timeline_viz = TimelineVisualizationCallback() + # use store=None to write to temporary zarr + cubed.to_zarr( + e, + store=None, + callbacks=[progress, hist, timeline_viz], + ) diff --git a/examples/modal/aws/README.md b/examples/modal/aws/README.md index 5c8a1e5fe..1f2000d15 100644 --- a/examples/modal/aws/README.md +++ b/examples/modal/aws/README.md @@ -1,6 +1,6 @@ # Examples running Cubed on Modal -**Warning: Modal does not guarantee that functions run in any particular cloud region, so it is not currently recommended that you run large computations since excessive data transfer fees are possible.** +**Warning: Modal does not guarantee that functions run in any particular cloud region, so it is not currently recommended that you run large computations since excessive data transfer fees are likely.** ## Pre-requisites @@ -22,31 +22,10 @@ export CUBED_MODAL_REQUIREMENTS_FILE=$(pwd)/requirements.txt ## Examples -Start with the simplest example: +Before running the examples, first change to the top-level examples directory (`cd ../..`) and type ```shell -python modal-add-asarray.py "s3://cubed-modal-$USER-temp" +export CUBED_CONFIG=$(pwd)/modal/aws ``` -If successful it should print a 4x4 matrix. - -Run the other examples in a similar way - -```shell -python modal-add-random.py "s3://cubed-modal-$USER-temp" -``` - -and - -```shell -python modal-matmul-random.py "s3://cubed-modal-$USER-temp" -``` - -These will take longer to run as they operate on more data. - -The last two examples use `TimelineVisualizationCallback` which produce a plot showing the timeline of events in the task lifecycle. -The plots are `png` files and are written in the `history` directory in a directory with a timestamp. Open the latest one with - -```shell -open $(ls -d history/compute-* | tail -1)/timeline.png -``` +Then you can run the examples described [there](../../README.md). diff --git a/examples/modal/aws/cubed.yaml b/examples/modal/aws/cubed.yaml new file mode 100644 index 000000000..b93ad7c6d --- /dev/null +++ b/examples/modal/aws/cubed.yaml @@ -0,0 +1,6 @@ +spec: + work_dir: "s3://cubed-$USER-temp" + allowed_mem: "2GB" + executor_name: "modal" + executor_options: + cloud: "aws" diff --git a/examples/modal/aws/modal-add-asarray.py b/examples/modal/aws/modal-add-asarray.py deleted file mode 100644 index 26c4baa3c..000000000 --- a/examples/modal/aws/modal-add-asarray.py +++ /dev/null @@ -1,26 +0,0 @@ -import sys - -import cubed -import cubed.array_api as xp -from cubed.extensions.tqdm import TqdmProgressBar, std_out_err_redirect_tqdm -from cubed.runtime.executors.modal_async import AsyncModalDagExecutor - -if __name__ == "__main__": - tmp_path = sys.argv[1] - spec = cubed.Spec(tmp_path, allowed_mem=100000) - executor = AsyncModalDagExecutor() - a = xp.asarray( - [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]], - chunks=(2, 2), - spec=spec, - ) - b = xp.asarray( - [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]], - chunks=(2, 2), - spec=spec, - ) - c = xp.add(a, b) - with std_out_err_redirect_tqdm() as orig_stdout: - progress = TqdmProgressBar(file=orig_stdout, dynamic_ncols=True) - res = c.compute(executor=executor, callbacks=[progress]) - print(res) diff --git a/examples/modal/aws/modal-add-random.py b/examples/modal/aws/modal-add-random.py deleted file mode 100644 index 57d4cef80..000000000 --- a/examples/modal/aws/modal-add-random.py +++ /dev/null @@ -1,29 +0,0 @@ -import sys - -import cubed -import cubed.array_api as xp -import cubed.random -from cubed.extensions.history import HistoryCallback -from cubed.extensions.timeline import TimelineVisualizationCallback -from cubed.extensions.tqdm import TqdmProgressBar, std_out_err_redirect_tqdm -from cubed.runtime.executors.modal_async import AsyncModalDagExecutor - -if __name__ == "__main__": - tmp_path = sys.argv[1] - spec = cubed.Spec(tmp_path, allowed_mem=2_000_000_000) - executor = AsyncModalDagExecutor() - a = cubed.random.random( - (50000, 50000), chunks=(5000, 5000), spec=spec - ) # 200MB chunks - b = cubed.random.random( - (50000, 50000), chunks=(5000, 5000), spec=spec - ) # 200MB chunks - c = xp.add(a, b) - with std_out_err_redirect_tqdm() as orig_stdout: - progress = TqdmProgressBar(file=orig_stdout, dynamic_ncols=True) - hist = HistoryCallback() - timeline_viz = TimelineVisualizationCallback() - # use store=None to write to temporary zarr - cubed.to_zarr( - c, store=None, executor=executor, callbacks=[progress, hist, timeline_viz] - ) diff --git a/examples/modal/aws/modal-matmul-random.py b/examples/modal/aws/modal-matmul-random.py deleted file mode 100644 index 076301a97..000000000 --- a/examples/modal/aws/modal-matmul-random.py +++ /dev/null @@ -1,31 +0,0 @@ -import sys - -import cubed -import cubed.array_api as xp -import cubed.random -from cubed.extensions.history import HistoryCallback -from cubed.extensions.timeline import TimelineVisualizationCallback -from cubed.extensions.tqdm import TqdmProgressBar, std_out_err_redirect_tqdm -from cubed.runtime.executors.modal_async import AsyncModalDagExecutor - -if __name__ == "__main__": - tmp_path = sys.argv[1] - spec = cubed.Spec(tmp_path, allowed_mem=2_000_000_000) - executor = AsyncModalDagExecutor() - a = cubed.random.random( - (50000, 50000), chunks=(5000, 5000), spec=spec - ) # 200MB chunks - b = cubed.random.random( - (50000, 50000), chunks=(5000, 5000), spec=spec - ) # 200MB chunks - c = xp.astype(a, xp.float32) - d = xp.astype(b, xp.float32) - e = xp.matmul(c, d) - with std_out_err_redirect_tqdm() as orig_stdout: - progress = TqdmProgressBar(file=orig_stdout, dynamic_ncols=True) - hist = HistoryCallback() - timeline_viz = TimelineVisualizationCallback() - # use store=None to write to temporary zarr - cubed.to_zarr( - e, store=None, executor=executor, callbacks=[progress, hist, timeline_viz] - ) diff --git a/examples/modal/aws/requirements.txt b/examples/modal/aws/requirements.txt index 09394ad34..b8fa061f4 100644 --- a/examples/modal/aws/requirements.txt +++ b/examples/modal/aws/requirements.txt @@ -1,3 +1,3 @@ cubed s3fs -tqdm +rich diff --git a/examples/modal/gcp/README.md b/examples/modal/gcp/README.md index 1d53411dd..af87f6b75 100644 --- a/examples/modal/gcp/README.md +++ b/examples/modal/gcp/README.md @@ -1,6 +1,6 @@ # Examples running Cubed on Modal -**Warning: Modal does not guarantee that functions run in any particular cloud region, so it is not currently recommended that you run large computations since excessive data transfer fees are possible.** +**Warning: Modal does not guarantee that functions run in any particular cloud region, so it is not currently recommended that you run large computations since excessive data transfer fees are likely.** ## Pre-requisites @@ -22,31 +22,10 @@ export CUBED_MODAL_REQUIREMENTS_FILE=$(pwd)/requirements.txt ## Examples -Start with the simplest example: +Before running the examples, first change to the top-level examples directory (`cd ../..`) and type ```shell -python modal-add-asarray.py "gs://cubed-modal-$USER-temp" +export CUBED_CONFIG=$(pwd)/modal/gcp ``` -If successful it should print a 4x4 matrix. - -Run the other examples in a similar way - -```shell -python modal-add-random.py "gs://cubed-modal-$USER-temp" -``` - -and - -```shell -python modal-matmul-random.py "gs://cubed-modal-$USER-temp" -``` - -These will take longer to run as they operate on more data. - -The last two examples use `TimelineVisualizationCallback` which produce a plot showing the timeline of events in the task lifecycle. -The plots are `png` files and are written in the `history` directory in a directory with a timestamp. Open the latest one with - -```shell -open $(ls -d history/compute-* | tail -1)/timeline.png -``` +Then you can run the examples described [there](../../README.md). diff --git a/examples/modal/gcp/cubed.yaml b/examples/modal/gcp/cubed.yaml new file mode 100644 index 000000000..10d6260f1 --- /dev/null +++ b/examples/modal/gcp/cubed.yaml @@ -0,0 +1,6 @@ +spec: + work_dir: "gs://cubed-$USER-temp" + allowed_mem: "2GB" + executor_name: "modal" + executor_options: + cloud: "gcp" diff --git a/examples/modal/gcp/modal-add-asarray.py b/examples/modal/gcp/modal-add-asarray.py deleted file mode 100644 index cfe5d1431..000000000 --- a/examples/modal/gcp/modal-add-asarray.py +++ /dev/null @@ -1,26 +0,0 @@ -import sys - -import cubed -import cubed.array_api as xp -from cubed.extensions.tqdm import TqdmProgressBar, std_out_err_redirect_tqdm -from cubed.runtime.executors.modal_async import AsyncModalDagExecutor - -if __name__ == "__main__": - tmp_path = sys.argv[1] - spec = cubed.Spec(tmp_path, allowed_mem=100000) - executor = AsyncModalDagExecutor(cloud="gcp") - a = xp.asarray( - [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]], - chunks=(2, 2), - spec=spec, - ) - b = xp.asarray( - [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]], - chunks=(2, 2), - spec=spec, - ) - c = xp.add(a, b) - with std_out_err_redirect_tqdm() as orig_stdout: - progress = TqdmProgressBar(file=orig_stdout, dynamic_ncols=True) - res = c.compute(executor=executor, callbacks=[progress]) - print(res) diff --git a/examples/modal/gcp/modal-add-random.py b/examples/modal/gcp/modal-add-random.py deleted file mode 100644 index 56c86794c..000000000 --- a/examples/modal/gcp/modal-add-random.py +++ /dev/null @@ -1,29 +0,0 @@ -import sys - -import cubed -import cubed.array_api as xp -import cubed.random -from cubed.extensions.history import HistoryCallback -from cubed.extensions.timeline import TimelineVisualizationCallback -from cubed.extensions.tqdm import TqdmProgressBar, std_out_err_redirect_tqdm -from cubed.runtime.executors.modal_async import AsyncModalDagExecutor - -if __name__ == "__main__": - tmp_path = sys.argv[1] - spec = cubed.Spec(tmp_path, allowed_mem=2_000_000_000) - executor = AsyncModalDagExecutor(cloud="gcp") - a = cubed.random.random( - (50000, 50000), chunks=(5000, 5000), spec=spec - ) # 200MB chunks - b = cubed.random.random( - (50000, 50000), chunks=(5000, 5000), spec=spec - ) # 200MB chunks - c = xp.add(a, b) - with std_out_err_redirect_tqdm() as orig_stdout: - progress = TqdmProgressBar(file=orig_stdout, dynamic_ncols=True) - hist = HistoryCallback() - timeline_viz = TimelineVisualizationCallback() - # use store=None to write to temporary zarr - cubed.to_zarr( - c, store=None, executor=executor, callbacks=[progress, hist, timeline_viz] - ) diff --git a/examples/modal/gcp/modal-matmul-random.py b/examples/modal/gcp/modal-matmul-random.py deleted file mode 100644 index fc068bb59..000000000 --- a/examples/modal/gcp/modal-matmul-random.py +++ /dev/null @@ -1,31 +0,0 @@ -import sys - -import cubed -import cubed.array_api as xp -import cubed.random -from cubed.extensions.history import HistoryCallback -from cubed.extensions.timeline import TimelineVisualizationCallback -from cubed.extensions.tqdm import TqdmProgressBar, std_out_err_redirect_tqdm -from cubed.runtime.executors.modal_async import AsyncModalDagExecutor - -if __name__ == "__main__": - tmp_path = sys.argv[1] - spec = cubed.Spec(tmp_path, allowed_mem=2_000_000_000) - executor = AsyncModalDagExecutor(cloud="gcp") - a = cubed.random.random( - (50000, 50000), chunks=(5000, 5000), spec=spec - ) # 200MB chunks - b = cubed.random.random( - (50000, 50000), chunks=(5000, 5000), spec=spec - ) # 200MB chunks - c = xp.astype(a, xp.float32) - d = xp.astype(b, xp.float32) - e = xp.matmul(c, d) - with std_out_err_redirect_tqdm() as orig_stdout: - progress = TqdmProgressBar(file=orig_stdout, dynamic_ncols=True) - hist = HistoryCallback() - timeline_viz = TimelineVisualizationCallback() - # use store=None to write to temporary zarr - cubed.to_zarr( - e, store=None, executor=executor, callbacks=[progress, hist, timeline_viz] - ) diff --git a/examples/modal/gcp/requirements.txt b/examples/modal/gcp/requirements.txt index 2ba9798d4..947355ced 100644 --- a/examples/modal/gcp/requirements.txt +++ b/examples/modal/gcp/requirements.txt @@ -1,3 +1,3 @@ cubed gcsfs -tqdm +rich