Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU versioning #1

Closed
cboettig opened this issue Jan 2, 2020 · 4 comments
Closed

GPU versioning #1

cboettig opened this issue Jan 2, 2020 · 4 comments

Comments

@cboettig
Copy link
Member

cboettig commented Jan 2, 2020

Which versions of CUDA will we support, and how will we indicate which version we are on?

NVIDIA images have a lot of tags, and do all minor version releases. We're almost surely going to be pip installing python binaries for tensorflow, which are only available for certain CUDA versions anyway.

It's probably best we just have a rule that pins the CUDA version to something else (e.g. the R version being the obvious choice, like we do for everything else). We would then probably follow the same sliding version scale that they are using at tensorflow, where they are on 10.0 for now.

(I suppose there's also the related question of which python version we use for the ML stack, though I think we can safely go all python 3. though I think the rstudio build recipe somehow installs python 2.7 anyway....)

@noamross
Copy link
Collaborator

noamross commented Jan 4, 2020

Yes, rstudio installs 2.7 and I think shiny may need it to (but possibly only when building from source?). I think that means we should install python in a virtualenv or with miniconda and set up Tensorflow + friends to use that environment by default?

I think the only twist with CUDA versions is whether we want to support different hardware. Different CUDA versions are compatible with different hardware, so it may be worthwhile to do something like 3.6.2-cuda9, and 3.6.2-cuda10.

@cboettig
Copy link
Member Author

cboettig commented Jan 6, 2020

Sounds good. Is there any documentation of hardware dependency side? I thought cuda 10 was compatible with most older nvidia processors still, and it looks like at least some of the new packages won't run on old cuda (maybe including current tensorflow?).

For python virtualenv setup, I played with that a whole bunch (though I think some env var things are now improved in tensorflow R package -- it used to have some funny behavior where it liked having it's own virtualenv separate from reticulate). so I'm partial to the config I have in https://github.com/rocker-org/ml/blob/master/ubuntu/shared/install_python.sh and https://github.com/rocker-org/ml/blob/master/ubuntu/shared/config_R_cuda.sh but of course open to discussion. venv / pip seems to better match what the python folks are doing over at binder etc as well, and they definitely know their python stack and I think it helps to have these things aligned.

@wlandau
Copy link

wlandau commented Feb 20, 2020

I think the only twist with CUDA versions is whether we want to support different hardware. Different CUDA versions are compatible with different hardware, so it may be worthwhile to do something like 3.6.2-cuda9, and 3.6.2-cuda10.

Will the TensorFlow version factor in here too, e.g. r3.6.2-tf2.1.0-cuda10? Do you plan to make a strategic subset of R version x TF version x CUDA version?

Sounds good. Is there any documentation of hardware dependency side? I thought cuda 10 was compatible with most older nvidia processors still, and it looks like at least some of the new packages won't run on old cuda (maybe including current tensorflow?).

Compatibility of versions URL
GPU vs compute capability https://developer.nvidia.com/cuda-gpus#compute
Compute capability vs CUDA SDK https://en.wikipedia.org/wiki/CUDA#GPUs_supported
CUDA SDK vs driver https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

For python virtualenv setup, I played with that a whole bunch (though I think some env var things are now improved in tensorflow R package -- it used to have some funny behavior where it liked having it's own virtualenv separate from reticulate). so I'm partial to the config I have in https://github.com/rocker-org/ml/blob/master/ubuntu/shared/install_python.sh and https://github.com/rocker-org/ml/blob/master/ubuntu/shared/config_R_cuda.sh but of course open to discussion. venv / pip seems to better match what the python folks are doing over at binder etc as well, and they definitely know their python stack and I think it helps to have these things aligned.

I agree. For what it's worth, here is what I have been using to set up a local venv/Miniconda build for an RStudio Cloud project. Maybe it could be one possible backup?

install.packages("keras")
reticulate::install_miniconda("miniconda")
Sys.setenv(WORKON_HOME = "virtualenvs")
reticulate::virtualenv_create("r-reticulate", python = "miniconda/bin/python")
keras::install_keras(
  method = "virtualenv",
  conda = "miniconda/bin/conda",
  envname = "r-reticulate",
  version = "2.3.1", # keras
  tensorflow = "1.13.1",
  restart_session = FALSE
)

@cboettig
Copy link
Member Author

I do think we need to support different cuda images, I currently have dev images for 10.0 and 10.2. We can pull these from upstream nvidia/cuda, unfortunately they haven't release a ubuntu:focal build yet but hopefully they'll do so soon.

I've been running a few projects on my GPU machine using the 10.0 and 10.2 images, and I find each one needs it's own python virtualenv anyway to support very particular versions of Tensorflow (one is only TF 2.1.0, several need TF 2.0.0, and a few need TF 1.14.0) I've found this pretty easy to manage with virtualenvs (though I haven't tried using it with renv yet) so I think that will be the unavoidable way to go.

I think we'll still ship an ML image with a tensorflow installation in place 'out-of-the-box', probably matching the version that the R keras package installs (currently 2.0.0 by default). That should make it easy for most users to get up and running in common situations without having to think about it, but I don't think we can create separate images for every venv configuration.

I'm not sure I've found anything I'm currently working on that needs cuda 10.0 (or worse, say cuda 9.0, though I guess if I had anything that still pinned at tensorflow 0.12.0 we would need 9.0 cuda). Cuda lib updates are a bit of a bear since it is easy to create hardware mismatches, see notes here: rocker-org/ml#28

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants