Skip to content

Latest commit

 

History

History
234 lines (174 loc) · 12.6 KB

manage_libraries.md

File metadata and controls

234 lines (174 loc) · 12.6 KB

Table of Contents

Managing libraries and versions

Python is just like all software, ever evolving and improving. So how do we manage making updates? And if there are updates, does everything else just work? Well, unfortunately no. Updating a library will often have requirements to update other libraries. And when your project becomes larger, this can become quite difficult to manually manage. This is where we stop trying to manually manage and use some other software to do that for us. We can always install packages using pip and we can set the version to install, but this can become difficult.

A common software package manager is called conda. This is software that is used to install Python and libraries you choose. It has the ability to pull from different source locations of your choosing, but most importantly it performs the version compatibility search to give the best chance of all your code working correctly.

Installing Python of your choosing

Chances are you have Python installed on your computer. The system actually uses Python for some of its processes. But you will not want to use that version. Updating will require root privileges and if you make a mistake it can cause major problems. It is best to install a different Python and manage that. I currently recommend conda for installing Python and the dependencies.

The other reason we will use conda is that we can have multiple Python installations with different dependencies for different projects. This allows us to try out some new packages or a new version without breaking other code we have running. The different Python installations will be in an environment and we can switch to different environments at any time.

Create a new environment

First we will use conda to create a new empty environment. The environment should have a name to make it easy to switch to a different environment and know which environment you are currently using. There is another method that uses the path to the environment, but we will talk about that later. This will create a new environment called my_env and we will enter that environment.

which python
  /Users/Galahad/miniconda3/envs/dqo-base/bin/python
  
conda create --name my_env
conda activate my_env
which python

Here we searched for which Python is installed and will be used when called. The system finds /Users/Galahad/miniconda3/envs/dqo-base/bin/python. Then we create a new envrionment called my_env and activate that environment. When we search for Python it does not find one. That is because the previous Python we found was in a different environment. When we switch environments the old environment is not accessible. This is a good thing because it means each environment is independent. If we want to install Python we need to tell conda to install Python when we create the environment. In this example, we state which version of Python to install with "=3.11" appended to python. Since we are creating a new environment with the same name it will ask if I want to remove it before creating a new environment with the same name. Notice I needed to exit the environment with conda deactivate before replacing it.

conda deactivate
conda create -n my_env python=3.11
WARNING: A conda environment already exists at '/Users/Galahad/miniconda3/envs/my_env'
Remove existing environment (y/[n])? y
  
  ... stuff ...

conda env list
  # conda environments:
  #
  base                     /Users/Galahad/miniconda3
  class                    /Users/Galahad/miniconda3/envs/class
  dqo-base              *  /Users/Galahad/miniconda3/envs/dqo-base
  my_env                   /Users/Galahad/miniconda3/envs/my_env

conda activate my_env
python
>>> import copy
>>> copy.__file__
  '/Users/Galahad/miniconda3/envs/my_env/lib/python3.11/copy.py'
>>> import pandas
Traceback (most recent call last):
  File "", line 1, in 
ModuleNotFoundError: No module named 'pandas'

After the environment is created, we run conda env list to get a list of all currently available conda environments. We activate the environment with conda activate my_env. If we start Python and import a module we can see that the module lives in a path exclusive to the new environment called my_env. Importing Pandas will not work because we have not installed it yet.

We can install libraries with a simple request to conda. We can provide specific versions to install or just let it install the latest version that is going to work. Conda will search all the different libraries installed and to be installed to ensure they are compatible. If the library version needs to change to work, it will ask if we want to change/update a library to be compatible.

conda install pandas xarray
python
>>> import xarray, pandas

As we add more and more libraries we can see what is installed with conda list. This will list all the libraries currently installed in the current conda environment and the version installed. Notice that we asked for python 3.11 and it installed 3.11.7. If we leave a level of the version off it will use the most current version at that level. Since we didn't specify which Pandas to install it installed the latest version compatible with all other libraries.

conda list
  
# packages in environment at /Users/kehoe/miniconda3/envs/my_env:
#
# Name                    Version                   Build  Channel
blas                      1.0                    openblas  
bottleneck                1.3.7           py311hb9f6ed7_0  
bzip2                     1.0.8                h620ffc9_4  
ca-certificates           2023.12.12           hca03da5_0  
libcxx                    14.0.6               h848a8c0_0  
libffi                    3.4.4                hca03da5_0  
libgfortran               5.0.0           11_3_0_hca03da5_28  
libgfortran5              11.3.0              h009349e_28  
libopenblas               0.3.21               h269037a_0  
llvm-openmp               14.0.6               hc6e5704_0  
ncurses                   6.4                  h313beb8_0  
numexpr                   2.8.7           py311h6dc990b_0  
numpy                     1.26.3          py311he598dae_0  
numpy-base                1.26.3          py311hfbfe69c_0  
openssl                   3.0.13               h1a28f6b_0  
packaging                 23.1            py311hca03da5_0  
pandas                    2.1.4           py311h7aedaa7_0  
pip                       23.3.1          py311hca03da5_0  
python                    3.11.7               hb885b13_0

  ... more libraries

It may become important to create environments on different systems, or it may be important to keep track of which libraries we need for each project. Conda makes this easy with the concept of an environment file. This is a YAML file that directs conda on how to create a new environment. The file can be used to direct the environment building or to document the current environment of an existing environment. Let's have conda create an environment file describing my_env.

conda env export > environment.yaml

If we look at this file, we will see the name of the environment and all the installed libraries with the currently installed version. We can take this file and provide it to the conda create command to generate a new environment using all the libraries and the versions listed. But normally we do not need to list every package since many of the libraries installed are dependent and will be installed anyway. All we need to recreate this is to list the three packages we requested. And if we don't care about versions we can leave that off and let conda figure it out. Most of the time we can ignore versions. At some point we will need to fix an issue by manually listing what version we want installed, but we will cross that bridge another day.

So we can take the file created by conda and edit it to only list what we need to reproduce the current environment to the specific level for our needs. If we edit the file it will look something like this:

name: my_env

channels:
  - defaults

dependencies:
  - pandas
  - python=3.11
  - xarray

We can now take this file and create a new environment from scratch by pointing to this environment.yaml file for all configuration needs. Since the environment my_env already exist we will need to add the --force keyword. If the environment did not exist we could ignore that keyword.

conda env create -f environment.yaml --force
Collecting package metadata (repodata.json): done
Solving environment: done

  ... stuff ...

Preparing transaction: done
Verifying transaction: done
Executing transaction: done

Solving issues with conda

Sometimes the package we want will not be available in the default location (channel). Then we need to do a little work to go find it and tell conda where to download from. For example pint is not located in conda's default location. So when we try to install, it will fail.

conda install pint

Collecting package metadata (current_repodata.json): done
Solving environment: unsuccessful initial attempt using frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: unsuccessful initial attempt using frozen solve. Retrying with flexible solve.

PackagesNotFoundError: The following packages are not available from current channels:

  - pint

Current channels:

  - https://repo.anaconda.com/pkgs/main/osx-arm64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/r/osx-arm64
  - https://repo.anaconda.com/pkgs/r/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.

We need to do a little digging to see where the package exists. We can start by searching using conda defaults

conda search pint
  
Loading channels: done
No match found for: pint. Search: *pint*

If we tell conda where to look we can find it at a different host.

conda search -c conda-forge pint
Loading channels: done
# Name                       Version           Build  Channel             
pint                           0.8.1            py_1  conda-forge         

 ... more stuff ...
  
pint                            0.23    pyhd8ed1ab_0  conda-forge 

This means if we tell conda to install pint from a different channel (conda-forge), it will find the package and install. We perform this on the command line

conda install -c conda-forge pint

or we can update the environment.yaml file to have a list of places to look for packages. It will try to install the package with the first channel and if that fails it will go _down the list until it is successful. So if we update our environment.yaml file to look like this we can run the same command above and it will install pint from _conda-forge.

name: my_env

channels:
  - defaults
  - conda-forge

dependencies:
  - pandas
  - python=3.11
  - xarray
  - pint

But what if the package does not exist in any conda channels and is only installable through pip? We can tell conda to install some packages using pip by editing the environment.yaml file to list pip as a dependency and then use pip to install pint.

name: my_env

channels:
  - defaults

dependencies:
  - pip
  - pandas
  - python=3.11
  - xarray
  
  # This part allows install using pip
  - pip:
    - pint

Summary

Managing versions and dependencies for a small number of libraries is manageable by hand, but as the project scales it is best to use something else to do the work for you.

Conda can install Python and the libraries you want and put them into a separate environment so different projects do not interfere with each other if they have conflicts.

Using environment.yaml files to manage the libraries and dependencies can simplify your work. Typically it is best practice to erase an environment and start from new when making changes to more complex projects. There can be unknown issues that are resolved by starting from scratch. The environment.yaml file can be part of the project.

If each project uses a different conda environment you will need to remember to activate that environment before running. If everything fits into a single environment you can set that to be a default in the startup file for your shell (e.g. .bashrc or .bash_profile file)