Install script (#160)

* added install script, updated readme * updated readme * formatting * add careless test subprogram * ensure latest release * add example careless test output * bump version * reformat code block * reformat code * test reformat * fix formatting * add newline --------- Co-authored-by: Kevin Dalton <[email protected]>
rs-station · Jun 13, 2024 · 14c4a38 · 14c4a38
1 parent 2930d2e
commit 14c4a38
Show file tree

Hide file tree

Showing 5 changed files with 66 additions and 30 deletions.
diff --git a/README.md b/README.md
@@ -19,38 +19,58 @@ pip install careless
 ```
 
 ## Installation with GPU Support
-Careless supports GPU acceleration on NVIDIA GPUs. For users who would like to run `careless` in a cluster computing environment, requesting an interactive GPU node for the installation might be helpful. You may want to first follow the latest [Tensorflow installation instructions](https://www.tensorflow.org/install/pip#step-by-step_instructions) and then install `careless`. Specifically, 
-1) Create and activate a new environment with the appropriate Python version
-   ```bash
-   conda create -yn careless python=checktheversion
-   conda activate careless
-   pip install --upgrade pip
-   ```
-2) Install dependencies for GPU support (see the following paragraph)
-3) Install TensorFlow (and verify its GPU support)
-   ```bash
-   pip install tensorflow==checktheversion
-   #check that tensorflow sees the correct number of GPUs
-   python3 -c "import tensorflow as tf; print(len(tf.config.list_physical_devices('GPU')))"
-   ```
-4) Install `careless`
-   ```bash
-   pip install careless
-   ```
-
-The following dependencies are required for GPU support 
- - NVIDIA driver, 
- - CUDA Toolkit, and 
- - cuDNN. 
-
-You can determine the versions required by the latest TensorFlow release from the [TensorFlow docs](https://www.tensorflow.org/install/pip#software_requirements). The driver is usually installed through the system package manager and will require root privileges. In a cluster computing environment, a suitable version of the NVIDIA driver will usually be provided by your system administrators. The two libraries, CUDA toolkit and cuDNN, may either be installed through the system package manager or using the Anaconda python distribution as described in the [TensorFlow docs](https://www.tensorflow.org/install/pip#step-by-step_instructions). 
-
-You may confirm GPU acceleration is active using the `nvidia-smi` command to monitor GPU usage during model training. If you are having trouble enabling GPU support, you may want to use the `--tf-debug` flag during training for verbose logging of TensorFlow issues. 
+Careless supports GPU acceleration on NVIDIA GPUs through the CUDA library. We strongly encourage users to take advantage of this feature. To streamline installation, we maintain a script which installs careless with CUDA support. The following section will guide you through installing careless for the GPU. 
+
+1) **Install the NVIDIA driver** for your accelerator card. On most hyper performance computing scenarios, this driver should be pre-installed. If it is not, we suggest you contact your system administrator as installation will require elevated privileges. 
+
+    You may check if the driver is functional by typing `nvidia-smi`. If it is working properly you will see output like the following,
+
+        Thu Jun 13 13:01:32 2024                                                                       
+        +-----------------------------------------------------------------------------------------+    
+        | NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |    
+        |-----------------------------------------+------------------------+----------------------+    
+        | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |    
+        | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |    
+        |                                         |                        |               MIG M. |    
+        |=========================================+========================+======================|    
+        |   0  Tesla V100S-PCIE-32GB          On  |   00000000:86:00.0 Off |                    0 |    
+        | N/A   32C    P0             25W /  250W |       0MiB /  32768MiB |      0%      Default |    
+        |                                         |                        |                  N/A |    
+        +-----------------------------------------+------------------------+----------------------+    
+                                                                                                       
+        +-----------------------------------------------------------------------------------------+    
+        | Processes:                                                                              |    
+        |  GPU   GI   CI        PID   Type   Process name                              GPU Memory |    
+        |        ID   ID                                                               Usage      |    
+        |=========================================================================================|    
+        |  No running processes found                                                             |    
+        +-----------------------------------------------------------------------------------------+    
+    A faulty driver will give an error message:
+
+        NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running. 
+    If the driver isn't installed, you will see:
+
+        nvidia-smi: command not found
+2) **Install Anaconda**. Once you have confirmed the NIVIDIA driver is available, proceed to install the Anaconda python distribution by following the instructions [here](https://docs.anaconda.com/free/anaconda/install/) or as directed by your cluster documentation. Before proceeding, make sure you activate your conda base environment. For typically installations, this should normally happen by opening a new login shell. Alternatively, you may directly source the `conda.sh` in your Anaconda install directory. 
+3) **Install careless** and associated dependencies including CUDA by running: 
+
+        source <(curl -s https://raw.githubusercontent.com/rs-station/careless/main/install-cuda.sh)
+    This will automatically create a new conda environment named careless.
+
+Careless is now installed in its own environment. Whenever you want to run careless, you must first activate the careless conda environment by issuing `conda activate careless`. You can test CUDA support by running the `careless test` subprogram. If your installation was successful, you should see GPU devices listed in the output of `careless test` as in this example:
+
+        (careless) user@computer:~$ careless test
+        Careless version 0.4.2
+        ###############################################
+        # TensorFlow can access the following devices #
+        ###############################################
+         - CPU: /physical_device:CPU:0
+         - GPU: /physical_device:GPU:0
+
 
 ## Dependencies
 
 `careless` is likely to run on any operating system and python version which is compatible with TensorFlow. 
-Pip will handle installation of all dependencies. 
 `careless` uses mostly tools from the conventional scientific python stack plus
  - optimization routines from [TensorFlow](https://www.tensorflow.org/)
  - statistical distributions from [Tensorflow-Probability](https://www.tensorflow.org/probability)

diff --git a/careless/VERSION b/careless/VERSION
@@ -1 +1 @@
-0.4.2
+0.4.3
diff --git a/careless/careless.py b/careless/careless.py
@@ -25,6 +25,14 @@ def run_careless(parser):
         df = LaueFormatter.from_parser(parser)
     elif parser.type == 'mono':
         df = MonoFormatter.from_parser(parser)
+    elif parser.type == 'test':
+        print("###############################################")
+        print("# TensorFlow can access the following devices #")
+        print("###############################################")
+        for dev in tf.config.list_physical_devices():
+            print(f" - {dev.device_type}: {dev.name}")
+        from sys import exit
+        exit()
 
 
     inputs,rac = df.format_files(parser.reflection_files)

diff --git a/careless/parser.py b/careless/parser.py
@@ -48,6 +48,8 @@ class CustomParser(EnvironmentSettingsMixin):
      - Detect conflicting arguments and raise an informative error
     """
     def _validate_input_files(self, parser):
+        if parser.type == 'test':
+            return
         for inFN in parser.reflection_files:
             if not exists(inFN):
                 self.error(f"Unmerged reflection file {inFN} does not exist")
@@ -88,6 +90,7 @@ def _fill_text(self, text, width, indent):
 subs = parser.add_subparsers(title="Experiment Type", required=True, dest="type")
 mono_sub = subs.add_parser("mono", help="Process monochromatic diffraction data.", formatter_class=CustomFormatter)
 poly_sub = subs.add_parser("poly", help="Process polychromatic, 'Laue', diffraction data.", formatter_class=CustomFormatter)
+test_sub = subs.add_parser("test", help="Print available physical devices", formatter_class=CustomFormatter)
 
 from careless.args import required,poly,groups
 
@@ -112,3 +115,7 @@ def _fill_text(self, text, width, indent):
         mono_group.add_argument(*args, **kwargs)
         poly_group.add_argument(*args, **kwargs)
 
+# Test needs environment settings options
+from careless.args import tf_options
+for args,kwargs in tf_options.args_and_kwargs:
+    test_sub.add_argument(*args, **kwargs)
diff --git a/install-cuda.sh b/install-cuda.sh
@@ -44,4 +44,5 @@ export LD_LIBRARY_PATH="${ORIGINAL_LD_LIBRARY_PATH}"
 unset CUDNN_DIR
 unset PTXAS_DIR' >> $CONDA_PREFIX/etc/conda/deactivate.d/env_vars.sh
 
-pip install careless
+# Install careless
+pip install --upgrade careless