Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not use cudf or cuml when rapids-runtime = DASK #1039

Open
blis-teng opened this issue Dec 14, 2022 · 26 comments
Open

Could not use cudf or cuml when rapids-runtime = DASK #1039

blis-teng opened this issue Dec 14, 2022 · 26 comments
Assignees

Comments

@blis-teng
Copy link

I am trying to setup a dataproc cluster with GPU attached, to use cuml and cudf, I followed the instruction https://github.com/GoogleCloudDataproc/initialization-actions/blob/master/rapids/README.md
And able to setup the cluster, with nvidia driver successfully installed. But when I try

import cudf

It throws out the error

TypeError: C function cuda.ccudart.cudaStreamSynchronize has wrong signature (expected __pyx_t_4cuda_7ccudart_cudaError_t (__pyx_t_4cuda_7ccudart_cudaStream_t), got cudaError_t (cudaStream_t)

I follow the instruction here: https://docs.rapids.ai/notices/rsn0020/
But after the downgraded version, another error show up when import cudf which is

No module named 'pandas.core.arrays._arrow_utils'

The dask rapids installation version in rapids.sh is 22.04

@cjac
Copy link
Contributor

cjac commented Dec 15, 2022

Thank you for this report!

@nvliyuan do you want to take a look at this?

@cjac
Copy link
Contributor

cjac commented Dec 15, 2022

Actually, nvliyuan has been contributing to the spark runtime. Not certain who to tap about the dask runtime. I'll check the commit history shortly and get back to you.

@nvliyuan
Copy link
Contributor

nvliyuan commented Dec 15, 2022

Hi @cjac, since dask script failed since 22.06 version(2022.06), see this comments, so I believe this issue exists for a long time, maybe @mengdong @sameerz could involve some dask-rapids guys?

@jacobtomlinson
Copy link

jacobtomlinson commented Dec 16, 2022

Hey folks! I work on RAPIDS and Dask, happy to help. We are currently in the process of documenting and testing deploying RAPIDS on cloud platforms but I expect we will not get to Dataproc until after the holidays. But we will definitely dig into this as part of that work.

Pinging @mroeschke who may have some quick thoughts about the Pandas error. I expect pandas needs upgrading/downgrading.

@mroeschke
Copy link

I suspect your environment has pandas>=1.5 installed, and cudf was not compatible with that version of pandas until 22.10.

Therefore if you downgrade pandas<1.5 or upgrade cudf>22.10 the error No module named 'pandas.core.arrays._arrow_utils' should go away

@cjac
Copy link
Contributor

cjac commented Dec 17, 2022

Thank you Jacob and Matt!

@blis-teng - please let us know if this solves this issue for you so we can mark the issue resolved or otherwise offer an appropriate solution.

@cjac cjac self-assigned this Dec 17, 2022
@cjac
Copy link
Contributor

cjac commented Dec 19, 2022

@blis-teng - are you able to share the gcloud dataproc clusters create command you're using to spin up your cluster? I can try to give it a repro and see if I run into the same problems.

If you've got a support contract with GCP, I'd appreciate if you could open a support case and provide me the case #. By doing this, we can track our work and share case details privately rather than on the permanent record for the initialization-actions repository. Please do not open development cases as P2 or P1, as those are reserved for production outage situations, and development is by definition not a production environment.

C.J. in Cloud Support, Seattle

@blis-teng
Copy link
Author

I suspect your environment has pandas>=1.5 installed, and cudf was not compatible with that version of pandas until 22.10.

Therefore if you downgrade pandas<1.5 or upgrade cudf>22.10 the error No module named 'pandas.core.arrays._arrow_utils' should go away

I have tried, but it will not work.

  1. If I install pandas 1.3 in rapids.sh, in the dataproc, run "conda list" you can see the version is still 1.5, and the import error changed, but still related to pandas.
  2. If I try to install pandas 1.3 in Jupyter notebook after dataproc is ready, the "manaba install" will block the installation due to some dependencies are not resolved from any of the given channel.

@blis-teng
Copy link
Author

I used the cmd line from the given documentation in https://github.com/GoogleCloudDataproc/initialization-actions/blob/master/rapids/README.md
Minor details may be different, but the key parameters (gpu driver, rapids-runtime) are the same.

export CLUSTER_NAME=<cluster_name>
export GCS_BUCKET=<your bucket for the logs and notebooks>
export REGION=<region>
export NUM_GPUS=1
export NUM_WORKERS=2

gcloud dataproc clusters create $CLUSTER_NAME  \
    --region $REGION \
    --image-version=dp20 \
    --master-machine-type n1-custom-63500 \
    --num-workers $NUM_WORKERS \
    --worker-accelerator type=nvidia-tesla-t4,count=$NUM_GPUS \
    --worker-machine-type n1-standard-8 \
    --num-worker-local-ssds 1 \
    --initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/gpu/install_gpu_driver.sh,gs://goog-dataproc-initialization-actions-${REGION}/rapids/rapids.sh \
    --optional-components=JUPYTER,ZEPPELIN \
    --metadata gpu-driver-provider="NVIDIA",rapids-runtime="DASK" \
    --bucket $GCS_BUCKET \

@cjac
Copy link
Contributor

cjac commented Dec 28, 2022

okay, I'll try to reproduce it now.

@cjac
Copy link
Contributor

cjac commented Dec 28, 2022

With these arguments, it is installing pandas-1.2.5 and libcudf-22.04.00-cuda11. I think I found a bug in the rapids.sh script. I'll see if patching it improves the situation.

@cjac
Copy link
Contributor

cjac commented Dec 29, 2022

In order to use 22.10 with pandas>=1.5, I need to upgrade these python packages:

"cuspatial=${CUSPATIAL_VERSION}" "rope>=0.9.4" "gdal>3.5.0"

And gdal>3.5.0 is not available in bullseye. Backports only go up to 3.2, so I'm going to try ubuntu20.

@cjac
Copy link
Contributor

cjac commented Dec 29, 2022

cjac@cluster-1668020639-w-0:~$ apt-cache show libgdal-dev | grep ^Version
Version: 3.0.4+dfsg-1build3
cjac@cluster-1668020639-w-0:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.5 LTS
Release:        20.04
Codename:       focal

@cjac
Copy link
Contributor

cjac commented Dec 29, 2022

so no, it looks like pandas >= 1.5 is not stable. I'll try doing the lower numbers.

@cjac
Copy link
Contributor

cjac commented Dec 29, 2022

+ mamba install -y --no-channel-priority -c conda-forge -c nvidia -c rapidsai cudatoolkit=11.5 'pandas<1.5' rapids=22.04

Looking for: ['cudatoolkit=11.5', "pandas[version='<1.5']", 'rapids=22.04']


Pinned packages:
  - python 3.10.*
  - conda 22.9.*
  - python 3.10.*
  - r-base 4.1.*
  - r-recommended 4.1.*


Encountered problems while solving:
  - package rapids-22.04.00-cuda11_py39_ge08d166_149 requires python >=3.9,<3.10.0a0, but none of the providers can be installed

cjac@cluster-1668020639-w-0:~$ which conda
/opt/conda/default/bin/conda
cjac@cluster-1668020639-w-0:~$ /opt/conda/default/bin/python --version
Python 3.10.8

Now it looks like the python interpreter we install with dataproc is too new for the rapids release. I'll try 22.06 and 22.08 to see if either of those versions work.

@cjac
Copy link
Contributor

cjac commented Dec 29, 2022

Okay, I was able to get this working on 2.0-debian10 with dask-rapids 22.06

I had to specify this mamba command:

mamba install -n 'dask-rapids' -y --no-channel-priority -c 'conda-forge' -c 'nvidia' -c 'rapidsai'
"cudatoolkit=${CUDA_VERSION}" "pandas<1.5" "rapids=${RAPIDS_VERSION}" "python=3.9"

I'm testing the change with dask-rapids 22.08 ; if that works as well, I will submit a PR.

cjac added a commit to cjac/initialization-actions that referenced this issue Dec 29, 2022
@cjac
Copy link
Contributor

cjac commented Dec 30, 2022

@blis-teng - please try replacing the rapids.sh you link to from your project's initialization-actions checkout with this one.

https://github.com/cjac/initialization-actions/raw/dask-rapids-202212/rapids/rapids.sh

I am working with the product team to review this change. I should be able to close up PR #1041 pretty quick here.

@cjac
Copy link
Contributor

cjac commented Dec 30, 2022

You may have mentioned that you have not yet read the README.md[1] from the initialization-actions repository. Can you please review and confirm for me that you understand where you would like to copy rapids.sh[2] from my pre-release branch for testing?

[1] https://github.com/GoogleCloudDataproc/initialization-actions/blob/master/README.md#how-initialization-actions-are-used
[2] https://github.com/cjac/initialization-actions/raw/dask-rapids-202212/rapids/rapids.sh

cjac added a commit that referenced this issue Jan 3, 2023
* Update to work with dask-rapids 10.06

Fix for issue #1039

* Incremental changes

* 22.08 tested on 2.0-debian10

* added -m argument to mamba install ; previous test included
  dataproc:conda.env.config.uri which pre-defined the environment

* tested with rapids version 22.10

* rapids works with rocky
@cjac
Copy link
Contributor

cjac commented Jan 5, 2023

@blis-teng can you re-try using the latest rapids/rapids.sh from github?

@blis-teng
Copy link
Author

hi, @cjac sorry for the late reply, I will re-try the new rapids.sh and get back to you next week, thanks!

@cjac
Copy link
Contributor

cjac commented Jan 6, 2023

Thank you. Standing by for confirmation!
20230106T084758 + 7d will be 20230113T084757.

I am presently not able to reproduce your problem. If there is still a change to be made, I'd like to know that information early in the week, please.

@cjac
Copy link
Contributor

cjac commented Jan 8, 2023

Please remember to read the README I referenced. You are violating the guidance by using

--initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/gpu/install_gpu_driver.sh,gs://goog-dataproc-initialization-actions-${REGION}/rapids/rapids.sh \

@skirui-source
Copy link

Hi @cjac , could you please update to work with latest dask-rapids v22.12?

@cjac
Copy link
Contributor

cjac commented Jan 20, 2023

Not last I checked. What versions are you pinning to?

@skirui-source
Copy link

skirui-source commented Jan 25, 2023

@cjac

root@test-dataproc-rapids-dask-m:/# conda list ^cu
# packages in environment at /opt/conda/miniconda3:
#
# Name                    Version                   Build  Channel
cucim                     22.04.00        cuda_11_py38_g8dfed80_0    rapidsai
cuda-python               11.8.1           py38h241159d_2    conda-forge
cudatoolkit               11.2.72              h2bc3f7f_0    nvidia
cudf                      22.04.00        cuda_11_py38_g8bf0520170_0    rapidsai
cudf_kafka                22.04.00        py38_g8bf0520170_0    rapidsai
cugraph                   22.04.00        cuda11_py38_g58be5b53_0    rapidsai
cuml                      22.04.00        cuda11_py38_g95abbc746_0    rapidsai
cupy                      9.6.0            py38h177b0fd_0    conda-forge
cupy-cuda115              10.6.0                   pypi_0    pypi
curl                      7.86.0               h7bff187_1    conda-forge
cusignal                  22.04.00        py39_g06f58b4_0    rapidsai
cuspatial                 22.04.00        py38_ge8f9f84_0    rapidsai
custreamz                 22.04.00        py38_g8bf0520170_0    rapidsai
cuxfilter                 22.04.00        py38_gf251a67_0    rapidsai

root@test-dataproc-rapids-dask-m:/# conda list ^das
# packages in environment at /opt/conda/miniconda3:
#
# Name                    Version                   Build  Channel
dask                      2022.3.0           pyhd8ed1ab_1    conda-forge
dask-bigquery             2022.5.0           pyhd8ed1ab_0    conda-forge
dask-core                 2022.3.0           pyhd8ed1ab_0    conda-forge
dask-cuda                 22.04.00                 py38_0    rapidsai
dask-cudf                 22.04.00        cuda_11_py38_g8bf0520170_0    rapidsai
dask-glm                  0.2.0                      py_1    conda-forge
dask-ml                   2022.5.27          pyhd8ed1ab_0    conda-forge
dask-sql                  2022.8.0           pyhd8ed1ab_0    conda-forge
dask-yarn                 0.9              py38h578d9bd_2    conda-forge

I was wondering if you can upgrade rapids.sh to install latest rapids v22.12? or is there a reason not to? (**Ps I am aware you recently upgraded to 22.10. which I am yet to test)

@cjac
Copy link
Contributor

cjac commented Jan 25, 2023

Hi @cjac , could you please update to work with latest dask-rapids v22.12?

I'm about to go on vacation, and I'm trying to put projects down. Can you open a new issue or better yet a GCP support case so I don't lose track of the work item, please?

This issue is about the action not working. I think it's working now, but not patched up to latest release. A separate issue would be appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants