GPU versioning #1

cboettig · 2020-01-02T21:32:08Z

Which versions of CUDA will we support, and how will we indicate which version we are on?

NVIDIA images have a lot of tags, and do all minor version releases. We're almost surely going to be pip installing python binaries for tensorflow, which are only available for certain CUDA versions anyway.

It's probably best we just have a rule that pins the CUDA version to something else (e.g. the R version being the obvious choice, like we do for everything else). We would then probably follow the same sliding version scale that they are using at tensorflow, where they are on 10.0 for now.

(I suppose there's also the related question of which python version we use for the ML stack, though I think we can safely go all python 3. though I think the rstudio build recipe somehow installs python 2.7 anyway....)

The text was updated successfully, but these errors were encountered:

noamross · 2020-01-04T21:31:26Z

Yes, rstudio installs 2.7 and I think shiny may need it to (but possibly only when building from source?). I think that means we should install python in a virtualenv or with miniconda and set up Tensorflow + friends to use that environment by default?

I think the only twist with CUDA versions is whether we want to support different hardware. Different CUDA versions are compatible with different hardware, so it may be worthwhile to do something like 3.6.2-cuda9, and 3.6.2-cuda10.

cboettig · 2020-01-06T18:27:13Z

Sounds good. Is there any documentation of hardware dependency side? I thought cuda 10 was compatible with most older nvidia processors still, and it looks like at least some of the new packages won't run on old cuda (maybe including current tensorflow?).

For python virtualenv setup, I played with that a whole bunch (though I think some env var things are now improved in tensorflow R package -- it used to have some funny behavior where it liked having it's own virtualenv separate from reticulate). so I'm partial to the config I have in https://github.com/rocker-org/ml/blob/master/ubuntu/shared/install_python.sh and https://github.com/rocker-org/ml/blob/master/ubuntu/shared/config_R_cuda.sh but of course open to discussion. venv / pip seems to better match what the python folks are doing over at binder etc as well, and they definitely know their python stack and I think it helps to have these things aligned.

wlandau · 2020-02-20T14:43:14Z

I think the only twist with CUDA versions is whether we want to support different hardware. Different CUDA versions are compatible with different hardware, so it may be worthwhile to do something like 3.6.2-cuda9, and 3.6.2-cuda10.

Will the TensorFlow version factor in here too, e.g. r3.6.2-tf2.1.0-cuda10? Do you plan to make a strategic subset of R version x TF version x CUDA version?

Sounds good. Is there any documentation of hardware dependency side? I thought cuda 10 was compatible with most older nvidia processors still, and it looks like at least some of the new packages won't run on old cuda (maybe including current tensorflow?).

Compatibility of versions	URL
GPU vs compute capability	https://developer.nvidia.com/cuda-gpus#compute
Compute capability vs CUDA SDK	https://en.wikipedia.org/wiki/CUDA#GPUs_supported
CUDA SDK vs driver	https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

For python virtualenv setup, I played with that a whole bunch (though I think some env var things are now improved in tensorflow R package -- it used to have some funny behavior where it liked having it's own virtualenv separate from reticulate). so I'm partial to the config I have in https://github.com/rocker-org/ml/blob/master/ubuntu/shared/install_python.sh and https://github.com/rocker-org/ml/blob/master/ubuntu/shared/config_R_cuda.sh but of course open to discussion. venv / pip seems to better match what the python folks are doing over at binder etc as well, and they definitely know their python stack and I think it helps to have these things aligned.

I agree. For what it's worth, here is what I have been using to set up a local venv/Miniconda build for an RStudio Cloud project. Maybe it could be one possible backup?

install.packages("keras")
reticulate::install_miniconda("miniconda")
Sys.setenv(WORKON_HOME = "virtualenvs")
reticulate::virtualenv_create("r-reticulate", python = "miniconda/bin/python")
keras::install_keras(
  method = "virtualenv",
  conda = "miniconda/bin/conda",
  envname = "r-reticulate",
  version = "2.3.1", # keras
  tensorflow = "1.13.1",
  restart_session = FALSE
)

cboettig · 2020-04-24T21:55:39Z

I do think we need to support different cuda images, I currently have dev images for 10.0 and 10.2. We can pull these from upstream nvidia/cuda, unfortunately they haven't release a ubuntu:focal build yet but hopefully they'll do so soon.

I've been running a few projects on my GPU machine using the 10.0 and 10.2 images, and I find each one needs it's own python virtualenv anyway to support very particular versions of Tensorflow (one is only TF 2.1.0, several need TF 2.0.0, and a few need TF 1.14.0) I've found this pretty easy to manage with virtualenvs (though I haven't tried using it with renv yet) so I think that will be the unavoidable way to go.

I think we'll still ship an ML image with a tensorflow installation in place 'out-of-the-box', probably matching the version that the R keras package installs (currently 2.0.0 by default). That should make it easy for most users to get up and running in common situations without having to think about it, but I don't think we can create separate images for every venv configuration.

I'm not sure I've found anything I'm currently working on that needs cuda 10.0 (or worse, say cuda 9.0, though I guess if I had anything that still pinned at tensorflow 0.12.0 we would need 9.0 cuda). Cuda lib updates are a bit of a bear since it is easy to create hardware mismatches, see notes here: rocker-org/ml#28

cboettig closed this as completed Apr 24, 2020

xc308 mentioned this issue May 17, 2024

OPENBLAS error in cuda_4.3.3.sif #820

Open

sneumann mentioned this issue Dec 2, 2024

Building a custom rstudio with cuda libraries #879

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU versioning #1

GPU versioning #1

cboettig commented Jan 2, 2020

noamross commented Jan 4, 2020

cboettig commented Jan 6, 2020

wlandau commented Feb 20, 2020

cboettig commented Apr 24, 2020

GPU versioning #1

GPU versioning #1

Comments

cboettig commented Jan 2, 2020

noamross commented Jan 4, 2020

cboettig commented Jan 6, 2020

wlandau commented Feb 20, 2020

cboettig commented Apr 24, 2020