Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: switch to conda dssp installation #549

Merged
merged 63 commits into from
Jan 24, 2024
Merged
Show file tree
Hide file tree
Changes from 56 commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
ef80c23
add files for building conda env
gcroci2 Nov 27, 2023
a05eb97
add deeprank2 to requirements.txt
gcroci2 Nov 27, 2023
855904b
add Dockerfile
gcroci2 Nov 27, 2023
d3cf4f9
finalize dockerfile
gcroci2 Nov 27, 2023
f8d1a10
update docs for running docker and installation using yml file
gcroci2 Nov 27, 2023
750f0f0
update tutorials for limiting data by default
gcroci2 Nov 27, 2023
a672395
add suggestions from reviews
gcroci2 Nov 28, 2023
2ad3099
add Dani's suggestions
gcroci2 Dec 12, 2023
5583fea
use mambaforge instead of miniconda for make the env build faster
gcroci2 Dec 19, 2023
7e4171b
Merge branch 'main' into 527_add_dockerfile_gcroci2
gcroci2 Dec 19, 2023
ddddca4
Merge branch 'main' into 527_add_dockerfile_gcroci2
gcroci2 Dec 20, 2023
cf3dd80
merge with main
gcroci2 Dec 21, 2023
484d580
update deeprank2 installation version
gcroci2 Dec 21, 2023
5cde0bb
Merge pull request #528 from DeepRank/527_add_dockerfile_gcroci2
gcroci2 Dec 21, 2023
62ca72e
fix typos in the paper
gcroci2 Dec 21, 2023
b0f87a2
remove package ref from index.rst
gcroci2 Dec 21, 2023
abf4af2
improve add_features functionality for users
gcroci2 Dec 21, 2023
b5d6307
fix package reference link
gcroci2 Dec 21, 2023
f6a93ea
Merge pull request #538 from DeepRank/minor_review_joss_gcroci2
gcroci2 Dec 21, 2023
4fb9f8e
add conda installation for dssp in the yml file
gcroci2 Jan 15, 2024
f9f93bc
remove the not-conda dssp installation
gcroci2 Jan 15, 2024
e106c88
add dssp conda installation to the action.yml file
gcroci2 Jan 15, 2024
fc57c94
try to fix dssp conda installation
gcroci2 Jan 15, 2024
4f0a478
add libgcc-ng to the conda installation
gcroci2 Jan 15, 2024
1b5abc4
retry dssp installation fix
gcroci2 Jan 15, 2024
93e7a5a
try to print out warning for failing tests
gcroci2 Jan 18, 2024
9deabe3
install dependencies via the yml file
gcroci2 Jan 18, 2024
91775b6
try to fix conda activate
gcroci2 Jan 18, 2024
0faf301
try to fix conda env installation
gcroci2 Jan 18, 2024
1bc12fc
use conda-incubator action for miniconda
gcroci2 Jan 18, 2024
941823b
update installation on CI
gcroci2 Jan 18, 2024
9968cfb
try again with conda-incubator action
gcroci2 Jan 19, 2024
108234a
try to fix error
gcroci2 Jan 19, 2024
d7f05c5
print info about conda env
gcroci2 Jan 19, 2024
5f902ca
go back to original action but with dssp installed via conda
gcroci2 Jan 19, 2024
7ecbe7d
add pip dep again to the toml
gcroci2 Jan 19, 2024
5b8dd7a
add conda env list
gcroci2 Jan 19, 2024
08a2213
try the installation using the yml file
gcroci2 Jan 19, 2024
f6ee4f5
try to fix env update
gcroci2 Jan 19, 2024
b777787
add h5py to yml
gcroci2 Jan 19, 2024
141b981
add missing deps to the yml file
gcroci2 Jan 19, 2024
97ac8f5
fix packages errors
gcroci2 Jan 19, 2024
c551e47
fix markov-clustering installation
gcroci2 Jan 19, 2024
c1c0c63
remove env name
gcroci2 Jan 19, 2024
144d817
put dependencies back to the toml
gcroci2 Jan 19, 2024
4efc2ba
remove pdb2sql from requirements.txt
gcroci2 Jan 19, 2024
7bc996d
re-add python to yml file
gcroci2 Jan 19, 2024
36796c8
readd deeprank2 to requirements.txt
gcroci2 Jan 19, 2024
986b6dd
readd macos deps installation
gcroci2 Jan 19, 2024
c41f132
remove dssp from yml - the dockerfile does not need to be edited at t…
gcroci2 Jan 19, 2024
d8336aa
update docker file with dssp via conda
gcroci2 Jan 19, 2024
de7ad35
update docs for new dssp installation
gcroci2 Jan 19, 2024
edf44c4
merge with dev
gcroci2 Jan 22, 2024
3a7d69b
add init file to the tests folder
gcroci2 Jan 22, 2024
a9ce360
add cov fail under 80
gcroci2 Jan 22, 2024
db6a6a8
merge with dev
gcroci2 Jan 22, 2024
eb6acf8
formatting
DaniBodor Jan 23, 2024
486a8a1
suggestions to README
DaniBodor Jan 23, 2024
28b9735
fix typos and edit subtitles
gcroci2 Jan 24, 2024
7a4ce85
add toc to docs/installation.md
gcroci2 Jan 24, 2024
08a4013
remove pyhon 3.11 from the CI
gcroci2 Jan 24, 2024
ac12feb
Update docs/installation.md
gcroci2 Jan 24, 2024
f1519d2
Update docs/installation.md
gcroci2 Jan 24, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 13 additions & 30 deletions .github/actions/install-python-and-package/action.yml
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: "Install Python and deeprank2"
name: "Install Python and DeepRank2"

description: "Installs Python, updates pip and installs deeprank2 together with its dependencies."
description: "Installs Python, updates pip and installs DeepRank2 together with its dependencies."

inputs:
python-version:
Expand All @@ -27,8 +27,10 @@ runs:
with:
update-conda: true
python-version: ${{ inputs.python-version }}
conda-channels: anaconda
- run: conda --version
conda-channels: pytorch, pyg, bioconda, defaults, sbl, conda-forge
- run: |
conda --version
conda env list
shell: bash {0}
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved
- name: Python info
shell: bash -e {0}
Expand All @@ -41,16 +43,16 @@ runs:
CMAKE_INSTALL_PREFIX: .local
if: runner.os == 'Linux'
run: |
# Install dependencies not handled by setuptools
# Install deeprank2 conda dependencies
## DSSP
sudo apt-get install -y dssp
conda install -c sbl dssp>=4.2.2.1
## MSMS
conda install -c bioconda msms
conda install -c bioconda msms>=2.6.1
## PyTorch, PyG, PyG adds
### Installing for CPU only on the CI
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 -c pytorch
pip install torch_geometric==2.3.1
pip install torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-$(python3 -c "import torch; print(torch.__version__)")+cpu.html
conda install pytorch=2.1.1 torchvision=0.16.1 torchaudio=2.1.1 cpuonly=2.0.* -c pytorch
conda install pyg=2.4.0 -c pyg
pip install torch_scatter==2.1.2 torch_sparse==0.6.18 torch_cluster==1.6.3 torch_spline_conv==1.2.2 -f https://data.pyg.org/whl/torch-2.1.0+cpu.html
- name: Install dependencies on MacOS
shell: bash {0}
env:
Expand All @@ -59,26 +61,7 @@ runs:
run: |
# Install dependencies not handled by setuptools
## DSSP
git clone https://github.com/PDB-REDO/libcifpp.git --recurse-submodules
cd libcifpp
cmake -S . -B build -DCMAKE_INSTALL_PREFIX=$HOME/.local -DCMAKE_BUILD_TYPE=Release
cmake --build build
cmake --install build
#######
git clone https://github.com/mhekkel/libmcfp.git
cd libmcfp
mkdir build
cd build
cmake ..
cmake --build .
cmake --install .
#######
git clone https://github.com/PDB-REDO/dssp.git
cd dssp
mkdir build
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build
cmake --install build
conda install -c sbl dssp>=4.2.2.1
## MSMS
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved
cd /tmp/
wget http://mgltools.scripps.edu/downloads/tars/releases/MSMSRELEASE/REL2.6.1/msms_i86Linux2_2.6.1.tar.gz
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/coveralls.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ jobs:
python-version: ${{ matrix.python-version }}
extras-require: test
- name: Run unit tests with coverage
run: pytest --cov --cov-append --cov-report xml --cov-report term --cov-report html
run: pytest --cov --cov-append --cov-report xml --cov-fail-under=80 --cov-report term --cov-report html
- name: Coveralls
env:
GITHUB_TOKEN: ${{ secrets.github_token }}
Expand Down
32 changes: 32 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Pull base image
FROM --platform=linux/x86_64 condaforge/miniforge3:23.3.1-1

# Add files
ADD ./tutorials /home/deeprank2/tutorials
ADD ./env/environment.yml /home/deeprank2
ADD ./env/requirements.txt /home/deeprank2

# Install
RUN \
apt update -y && \
apt install unzip -y && \
## GCC
apt install -y gcc && \
## Conda and pip deps
mamba env create -f /home/deeprank2/environment.yml && \
## Get the data for running the tutorials
if [ -d "/home/deeprank2/tutorials/data_raw" ]; then rm -Rf /home/deeprank2/tutorials/data_raw; fi && \
if [ -d "/home/deeprank2/tutorials/data_processed" ]; then rm -Rf /home/deeprank2/tutorials/data_processed; fi && \
wget https://zenodo.org/records/8349335/files/data_raw.zip && \
unzip data_raw.zip -d data_raw && \
mv data_raw /home/deeprank2/tutorials

# Activate the environment
RUN echo "source activate deeprank2" > ~/.bashrc
ENV PATH /opt/conda/envs/deeprank2/bin:$PATH

# Define working directory
WORKDIR /home/deeprank2

# Define default command
CMD ["jupyter", "notebook", "--ip=0.0.0.0", "--NotebookApp.token=''","--NotebookApp.password=''", "--allow-root"]
127 changes: 86 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Deeprank2
# DeepRank2

| Badges | |
| :------------: ||
Expand Down Expand Up @@ -34,54 +34,100 @@ DeepRank2 extensive documentation can be found [here](https://deeprank2.rtfd.io/

## Table of contents

- [Deeprank2](#deeprank2)
- [DeepRank2](#deeprank2)
- [Overview](#overview)
- [Table of contents](#table-of-contents)
- [Installation](#installation)
- [Dependencies](#dependencies)
- [Deeprank2 Package](#deeprank2-package)
- [Test installation](#test-installation)
- [Contributing](#contributing)
- [Data generation](#data-generation)
- [Datasets](#datasets)
- [GraphDataset](#graphdataset)
- [GridDataset](#griddataset)
- [Training](#training)
- [Installations](#installations)
- [Containerized Installation](#containerized-installation)
- [Local/remote installation](#localremote-installation)
- [Non-pythonic dependencies](#non-pythonic-dependencies)
- [Pythonic dependencies](#pythonic-dependencies)
- [Deeprank2 Package](#deeprank2-package)
- [Test installation](#test-installation)
- [Contributing](#contributing)
- [Data generation](#data-generation)
- [Datasets](#datasets)
- [GraphDataset](#graphdataset)
- [GridDataset](#griddataset)
- [Training](#training)
- [Run a pre-trained model on new data](#run-a-pre-trained-model-on-new-data)
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved
- [Computational performances](#computational-performances)
- [Package development](#package-development)

## Installation
## Installations

gcroci2 marked this conversation as resolved.
Show resolved Hide resolved
The package officially supports ubuntu-latest OS only, whose functioning is widely tested through the continuous integration workflows.

gcroci2 marked this conversation as resolved.
Show resolved Hide resolved
### Dependencies
You can either install DeepRank2 in a [dockerized container](#containerized-installation), which will allow you to run our [tutorial notebooks](https://github.com/DeepRank/deeprank2/tree/main/tutorials), or you can [install the package locally](#localremote-installation).

gcroci2 marked this conversation as resolved.
Show resolved Hide resolved
Before installing deeprank2 you need to install some dependencies. We advise to use a [conda environment](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) with Python >= 3.10 installed. The following dependency installation instructions are updated as of 14/09/2023, but in case of issues during installation always refer to the official documentation which is linked below:
### Containerized Installation

- [MSMS](https://anaconda.org/bioconda/msms): `conda install -c bioconda msms`.
- [Here](https://ssbio.readthedocs.io/en/latest/instructions/msms.html) for MacOS with M1 chip users.
- [PyTorch](https://pytorch.org/get-started/locally/)
- We support torch's CPU library as well as CUDA.
- Currently, the package is tested using [PyTorch 2.0.1](https://pytorch.org/get-started/previous-versions/#v201).
- [PyG](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html) and its optional dependencies: `torch_scatter`, `torch_sparse`, `torch_cluster`, `torch_spline_conv`.
- [DSSP 4](https://swift.cmbi.umcn.nl/gv/dssp/)
- Check if `dssp` is installed: `dssp --version`. If this gives an error or shows a version lower than 4:
- on ubuntu 22.04 or newer: `sudo apt-get install dssp`. If the package cannot be located, first run `sudo apt-get update`.
- on older versions of ubuntu or on mac or lacking sudo priviliges: install from [here](https://github.com/pdb-redo/dssp), following the instructions listed. Alternatively, follow [this](https://github.com/PDB-REDO/libcifpp/issues/49) thread.
- [GCC](https://gcc.gnu.org/install/)
- Check if gcc is installed: `gcc --version`. If this gives an error, run `sudo apt-get install gcc`.
- For MacOS with M1 chip users only install [the conda version of PyTables](https://www.pytables.org/usersguide/installation.html).
In order to try out the package without worrying about your OS and without the need of installing all the required dependencies, we created a `Dockerfile` that can be used for taking care of everything in a suitable container. After having cloned the repository and installed [Docker](https://docs.docker.com/engine/install/), run the following commands (you may need to have sudo permission) from the root of the repository.

## Deeprank2 Package
Build the Docker image:

Once the dependencies are installed, you can install the latest stable release of deeprank2 using the PyPi package manager:
```bash
docker build -t deeprank2 .
```

Run the Docker container:

```bash
docker run -p 8888:8888 deeprank2
```

This assumes that your application inside the container is listening on port 8888, and you want to map it to port 8888 on your host machine. Open a browser and go to `http://localhost:8888` to access the application running inside the Docker container and run the tutorials' notebooks.
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved

More details about the tutorials' content can be found [here](https://github.com/DeepRank/deeprank2/blob/main/tutorials/TUTORIAL.md). Note that in the docker container only the raw PDB files are downloaded, needed as a starting point for the tutorials. You can obtain the processed HDF5 files by running the `data_generation_xxx.ipynb` notebooks. Because Docker containers are limited in memory resources, we limit the number of data points processed in the tutorials'. Please install the package locally to fully leverage its capabilities.

After running the tutorials, you may want to remove the (quite large) Docker image from your machine. In this case, remember to [stop the container](https://docs.docker.com/engine/reference/commandline/stop/) and then [remove the image](https://docs.docker.com/engine/reference/commandline/image_rm/). More general information about Docker can be found on the [official website docs](https://docs.docker.com/get-started/).

### Local/remote installation

#### Non-pythonic dependencies

Instructions are up to date as of 19 Jan 2024.

Before installing DeepRank2 you need to install some dependencies:

* [GCC](https://gcc.gnu.org/install/)
* Check if gcc is installed: `gcc --version`. If this gives an error, run `sudo apt-get install gcc`.

#### Pythonic dependencies

Instructions are up to date as of 19 Jan 2024.

Then, you can use the YML file we provide for creating a [conda environment](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) containing the latest stable release of the package and all the other necessary conda and pip dependencies (CPU only, Python 3.10):

```bash
# Ensure you are in your base environment
conda activate
# Create the environment
conda env create -f env/environment.yml
# Activate the environment
conda activate deeprank2
```

gcroci2 marked this conversation as resolved.
Show resolved Hide resolved
Alternatively, if you are a MacOS user, if the YML file installation is not successfull, or if you want to use CUDA or Python 3.11, you can install each dependency separately, and then the latest stable release of the package using the PyPi package manager. Also in this case, we advise to use a [conda environment](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html). In case of issues during installation, please refer to the official documentation for each package (linked below), as our instructions may be out of date:

gcroci2 marked this conversation as resolved.
Show resolved Hide resolved
* [DSSP 4](https://anaconda.org/sbl/dssp): `conda install -c sbl dssp`.
* [MSMS](https://anaconda.org/bioconda/msms): `conda install -c bioconda msms`.
* [Here](https://ssbio.readthedocs.io/en/latest/instructions/msms.html) for MacOS with M1 chip users.
* [PyTorch](https://pytorch.org/get-started/locally/)
* We support torch's CPU library as well as CUDA.
* Currently, the package is tested using [PyTorch 2.0.1](https://pytorch.org/get-started/previous-versions/#v201).
* [PyG](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html) and its optional dependencies: `torch_scatter`, `torch_sparse`, `torch_cluster`, `torch_spline_conv`.
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved
* For MacOS with M1 chip users only install [the conda version of PyTables](https://www.pytables.org/usersguide/installation.html).

#### Deeprank2 Package

Finally do:

```bash
pip install deeprank2
```

Alternatively, get all the new developments by cloning the repo and installing the editable version of the package with:
Alternatively, get the latest updates by cloning the repo and installing the editable version of the package with:

```bash
git clone https://github.com/DeepRank/deeprank2
Expand All @@ -91,22 +137,21 @@ pip install -e .'[test]'

The `test` extra is optional, and can be used to install test-related dependencies useful during the development.

### Test installation
#### Test installation

If you have installed the package from a cloned repository (second option above), you can check that all components were installed correctly, using pytest.
If you have installed the package from a cloned repository (the latter option above), you can check that all components were installed correctly, using pytest (run `pip install pytest` if you did not install it above).
The quick test should be sufficient to ensure that the software works, while the full test (a few minutes) will cover a much broader range of settings to ensure everything is correct.

Run `pytest tests/test_integration.py` for the quick test or just `pytest` for the full test (expect a few minutes to run).

### Contributing
## Contributing

If you would like to contribute to the package in any way, please see [our guidelines](CONTRIBUTING.rst).

The following section serves as a first guide to start using the package, using protein-protein Interface (PPI) queries
as example. For an enhanced learning experience, we provide in-depth [tutorial notebooks](https://github.com/DeepRank/deeprank2/tree/main/tutorials) for generating PPI data, generating SVR data, and for the training pipeline.
The following section serves as a first guide to start using the package, using protein-protein Interface (PPI) queries as example. For an enhanced learning experience, we provide in-depth [tutorial notebooks](https://github.com/DeepRank/deeprank2/tree/main/tutorials) for generating PPI data, generating SVR data, and for the training pipeline.
For more details, see the [extended documentation](https://deeprank2.rtfd.io/).

### Data generation
## Data generation

For each protein-protein complex (or protein structure containing a missense variant), a `Query` can be created and added to the `QueryCollection` object, to be processed later on. Two subtypes of `Query` exist: `ProteinProteinInterfaceQuery` and `SingleResidueVariantQuery`.

Expand Down Expand Up @@ -189,11 +234,11 @@ hdf5_paths = queries.process(
grid_map_method = MapMethod.GAUSSIAN)
```

### Datasets
## Datasets

Data can be split in sets implementing custom splits according to the specific application. Assuming that the training, validation and testing ids have been chosen (keys of the HDF5 file/s), then the `DeeprankDataset` objects can be defined.

#### GraphDataset
### GraphDataset

For training GNNs the user can create a `GraphDataset` instance:

Expand Down Expand Up @@ -227,7 +272,7 @@ dataset_test = GraphDataset(
)
```

#### GridDataset
### GridDataset

For training CNNs the user can create a `GridDataset` instance:

Expand Down Expand Up @@ -259,7 +304,7 @@ dataset_test = GridDataset(
)
```

### Training
## Training

Let's define a `Trainer` instance, using for example of the already existing `GINet`. Because `GINet` is a GNN, it requires a dataset instance of type `GraphDataset`.

Expand Down
59 changes: 57 additions & 2 deletions docs/features.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ Features implemented in the code-base are defined in `deeprank2.feature` subpack

## Custom features

Users can add custom features by creating a new module and placing it in `deeprank2.feature` subpackage. One requirement for any feature module is to implement an `add_features` function, as shown below. This will be used in `deeprank2.models.query` to add the features to the nodes or edges of the graph.
Users can add custom features by cloning the repository, creating a new module and placing it in `deeprank2.feature` subpackage. The custom features can then be used by installing the package in editable mode (see [here](https://deeprank2.readthedocs.io/en/latest/installation.html#install-deeprank2) for more details). We strongly recommend submitting a pull request (PR) to merge the new feature into the official repository.

One requirement for any feature module is to implement an `add_features` function, as shown below. This will be used in `deeprank2.models.query` to add the features to the nodes or edges of the graph.

```python

Expand All @@ -20,7 +22,60 @@ def add_features(
pass
```

The following is a brief description of the features already implemented in the code-base, for each features' module.
Additionally, the nomenclature of the custom feature should be added in `deeprank2.domain.edgestorage` or `deeprank2.domain.nodestorage`, depending on which type of feature it is.

As an example, this is the implementation of the node feature `res_type`, which represents the one-hot encoding of the amino acid residue and is defined in `deeprank2.features.components` module:

```python
from deeprank2.domain import nodestorage as Nfeat
from deeprank2.molstruct.atom import Atom
from deeprank2.molstruct.residue import Residue, SingleResidueVariant
from deeprank2.utils.graph import Graph

def add_features(
pdb_path: str, graph: Graph,
single_amino_acid_variant: Optional[SingleResidueVariant] = None
):

for node in graph.nodes:
if isinstance(node.id, Residue):
residue = node.id
elif isinstance(node.id, Atom):
atom = node.id
residue = atom.residue
else:
raise TypeError(f"Unexpected node type: {type(node.id)}")

node.features[Nfeat.RESTYPE] = residue.amino_acid.onehot
```

`RESTYPE` is the name of the variable assigned to the feature `res_type` in `deeprank2.domain.nodestorage`. In order to use the feature from DeepRank2 API, its module needs to be imported and specified during the queries processing:

```python
from deeprank2.features import components

feature_modules = [components]

# Save data into 3D-graphs only
hdf5_paths = queries.process(
"<output_folder>/<prefix_for_outputs>",
feature_modules = feature_modules)
```

Then, the feature `res_type` can be used from the DeepRank2 datasets API:

```python
from deeprank2.dataset import GraphDataset

node_features = ["res_type"]

dataset = GraphDataset(
hdf5_path = hdf5_paths,
node_features = node_features
)
```

The following is a brief description of the features already implemented in the code-base, for each features' module.

## Default node features

Expand Down
Loading
Loading