The following two files showcase how to process large dataframes using the @metaflow_ray
decorator with @kubernetes
.
utils.py
contains a remote function calledprocess_dataframe_chunk
which is used inprocess_dataframe
.
- The usage (/user code) of this is present in
__main__
and can be verified by runningpython examples/dataframe_process/utils.py
flow.py
shows how the usage (/user code) forprocess_dataframe
(after importing it fromutils.py
) can now be moved within theexecute
step of theRayDFProcessFlow
.
- This flow can now be run with
python examples/dataframe_process/flow.py --no-pylint --environment=pypi run
- If you are on the Outerbounds platform, you can leverage
fast-bakery
for blazingly fast docker image builds. This can be used bypython examples/dataframe_process/flow.py --no-pylint --environment=fast-bakery run