Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release: merge dev with main for release 3.0.0 #562

Merged
merged 191 commits into from
Jan 25, 2024
Merged
Changes from 45 commits
Commits
Show all changes
191 commits
Select commit Hold shift + click to select a range
e6feec0
rename `Query` class to `DeepRankQuery`
DaniBodor Sep 5, 2023
87e6156
make `DeepRankQuery` a dataclass with common args
DaniBodor Sep 6, 2023
d6112aa
improve readability of `DeepRankQuery` methods
DaniBodor Sep 6, 2023
068a03d
create generic `SingleResidueVariantQuery`
DaniBodor Sep 6, 2023
0b0f4f0
merge SRV `build` methods
DaniBodor Sep 6, 2023
0ece474
merge both PPI queries into single dataclass
DaniBodor Sep 6, 2023
908f3e9
improve readability of PPI helper functions
DaniBodor Sep 6, 2023
985be65
default `distance_cutoff` dependent on situation
DaniBodor Sep 6, 2023
8630e05
update and refactor tests, notebooks, and READMEs
DaniBodor Sep 6, 2023
95ce32d
remove obsolete hydrogenation code
DaniBodor Sep 6, 2023
17eb287
replace `_load_ppi_atoms` function
DaniBodor Sep 6, 2023
2f08eee
remove `_get_atom_node_key` method
DaniBodor Sep 7, 2023
fe86f62
update dev readme
gcroci2 Sep 25, 2023
a5ad04a
add relevant attributes to the Trainer and improve their logic
gcroci2 Oct 19, 2023
a094e2f
add tests for testing when no test is provided and when no mmodel is …
gcroci2 Oct 19, 2023
a6209cb
fix test_optim
gcroci2 Oct 20, 2023
d562513
change dataset_train to train_data and update docs, for later functio…
gcroci2 Oct 20, 2023
fa45c60
change dataset_train to train_data in all relevant scripts
gcroci2 Oct 20, 2023
fe18d3d
improve logic for handling both a pre-trained model and a dataset_tra…
gcroci2 Oct 20, 2023
f9b82de
add logic for handling the pre-trained model as input in DeeprankData…
gcroci2 Oct 21, 2023
c21fe2b
add tests for catching uncorrect pre-trained models
gcroci2 Oct 21, 2023
fc5f6af
add folder for pretrained models in tests
gcroci2 Oct 21, 2023
0a068b5
update data paths in test_dataset.py
gcroci2 Oct 21, 2023
bcc5138
implement inheritance in dataset from a pre-trained model
gcroci2 Oct 23, 2023
5280ece
add tests for inheritance from pre-trained model
gcroci2 Oct 23, 2023
7fcc033
add classes_to_index as inherited param and to the pre-trained model
gcroci2 Oct 23, 2023
cc3d79a
add classes_to_index to the tests' models
gcroci2 Oct 23, 2023
6b037ad
add classes_to_index's check to the tests
gcroci2 Oct 23, 2023
8e2496c
save features_transform's lambdas as strings and load them as functio…
gcroci2 Oct 23, 2023
11f826e
update pre-trained models
gcroci2 Oct 23, 2023
0fcea3a
add trainer tests for testing without defining the dataset_train
gcroci2 Oct 23, 2023
f8e6c57
fix test_dataset.py for the newly defined features_transform in the s…
gcroci2 Oct 23, 2023
b8e2348
remove dill usage since we're not saving lambda functions anymore (di…
gcroci2 Oct 23, 2023
060e6bf
improve initialization order in the Trainer class
gcroci2 Oct 23, 2023
1d3c0f5
fix datasets for cases in which there is a target attribute but no ta…
gcroci2 Oct 24, 2023
4d588e1
fix Trainer _eval method for cases in which there is a target attribu…
gcroci2 Oct 24, 2023
147a16a
add logic for checking the target settings in the init, and fix _filt…
gcroci2 Oct 24, 2023
08c90e0
add tests for cases with no target and improve target's filter tests
gcroci2 Oct 24, 2023
3b5d746
fix tests according to the new target's checks
gcroci2 Oct 24, 2023
a5fd524
add hdf5 file with no target
gcroci2 Oct 24, 2023
d728b2a
add new file with no target
gcroci2 Oct 24, 2023
0d35844
Merge branch '510_testing_pre_trained_gcroci2' of https://github.com/…
gcroci2 Oct 24, 2023
32f3e25
add test for verifying that the testing output is correct when target…
gcroci2 Oct 24, 2023
38f26c8
fix prospector errors
gcroci2 Oct 24, 2023
eee5512
fix build with python 3.11
gcroci2 Oct 24, 2023
d842fc7
try to fix geometric installation using pip instead of conda
gcroci2 Oct 24, 2023
e1265ae
fix prospector error
gcroci2 Oct 24, 2023
10a5795
add docs for testing a pre-trained model
gcroci2 Oct 25, 2023
bf1c8a3
fix bug in trainer for testing cases with no target
gcroci2 Oct 30, 2023
d19b561
Update README.dev.md
gcroci2 Nov 2, 2023
606dbd1
Delete deeprank2/tools/classdiagrams.sh
gcroci2 Nov 2, 2023
959f113
make `_load_pssm_data` a method of `DeepRankQuery`
DaniBodor Sep 11, 2023
ac42626
make `_check_pssm` a method of `DeepRankQuery`
DaniBodor Sep 11, 2023
8519280
update type hinting in query.py to 3.10 format
DaniBodor Sep 12, 2023
99a2c24
move `QueryCollection` to end of module
DaniBodor Sep 12, 2023
519a960
refactor `QueryCollection`
DaniBodor Sep 12, 2023
24af527
define separate parent and child build methods
DaniBodor Sep 12, 2023
3c6df95
refactor child specific helper functions of build
DaniBodor Sep 12, 2023
b417b9d
update linter
DaniBodor Sep 22, 2023
a2640b6
update docstrings and error messages
DaniBodor Sep 22, 2023
6480644
make distance cutoff uniform for query types
DaniBodor Oct 9, 2023
c2151e7
rename `DeepRankQuery` back to `Query`
DaniBodor Oct 9, 2023
d0d5957
separate `interaction_radius` from `max_edge_distance`
DaniBodor Sep 18, 2023
9645914
use separate radius and max_edge_dist in tests and notebooks
DaniBodor Sep 22, 2023
6df8f63
cleaner type hinting throughout
DaniBodor Sep 22, 2023
97db708
Merge pull request #492 from DeepRank/480_new
DaniBodor Nov 7, 2023
0bbe2aa
fix training uml link
gcroci2 Nov 8, 2023
d1716ee
fix typo in training uml link
gcroci2 Nov 8, 2023
0d07c18
fix link for training uml
gcroci2 Nov 8, 2023
4323410
upload svgs and try to link them to the readme.dev
gcroci2 Nov 9, 2023
cd2f502
improve svg image visualization for umls
gcroci2 Nov 9, 2023
e4b387b
add ref for the training uml
gcroci2 Nov 9, 2023
ae3fff3
Merge pull request #519 from DeepRank/251_regenerate_class_diagrams_g…
gcroci2 Nov 9, 2023
d5acd2c
update `max_edge_length` and `influence_radius`
DaniBodor Nov 17, 2023
623ef38
Merge branch 'dev' into 510_testing_pre_trained_gcroci2
gcroci2 Nov 17, 2023
4a2b32a
uniform use_tqdm parameter
gcroci2 Nov 17, 2023
d91377f
uniform root_directory_path parameter
gcroci2 Nov 17, 2023
96c9b94
uniform parameters' order in dataset.py
gcroci2 Nov 17, 2023
86b15fb
put redundant code for inheriting training info in the parent class
gcroci2 Nov 17, 2023
4164c5b
uniform grp variable
gcroci2 Nov 17, 2023
6a6a305
make inherit_params an attribute of the dataset classes
gcroci2 Nov 17, 2023
e92ae10
Merge pull request #504 from DeepRank/460_radius_vs_edgedistance_dbodor
DaniBodor Nov 17, 2023
79b00a2
improve testing new data part
gcroci2 Nov 21, 2023
ef57a25
add testing new data in the readme
gcroci2 Nov 21, 2023
d07b8c3
uniform pretrained_model_path to pretrained_model
gcroci2 Nov 21, 2023
55c061c
make error msg about the dataset clearer
gcroci2 Nov 21, 2023
2584917
use None instead of 'None' in the trainer _eval and _epoch methods
gcroci2 Nov 21, 2023
0906b2d
fix prospector errors
gcroci2 Nov 21, 2023
ae7fe4d
Merge branch 'dev' into 510_testing_pre_trained_gcroci2
gcroci2 Nov 21, 2023
836c5b2
move features checking after inheritance
gcroci2 Nov 22, 2023
b364eba
fix prospector errors
gcroci2 Nov 22, 2023
780f2b9
try to fix optimizer error in py3.11
gcroci2 Nov 22, 2023
2b69803
refactor: amino acid dictionaries
DaniBodor Sep 22, 2023
0d994de
docs: improve code documentation of utils/buildgraph.py
DaniBodor Sep 22, 2023
60775bd
refactor: reading atom data from pdb2sql object
DaniBodor Sep 22, 2023
da0f510
refactor: finding residue from pdb2sql key
DaniBodor Sep 22, 2023
2708c5c
style: make build_atomic_graph and build_residue_graph look similar
DaniBodor Sep 22, 2023
a777cb7
refactor: unify build_graph functions
DaniBodor Sep 22, 2023
9d508c0
Merge pull request #507 from DeepRank/506_buildgraph_unification_dbodor
DaniBodor Nov 23, 2023
2935414
Update docs/getstarted.md
gcroci2 Nov 24, 2023
5f55e68
Update README.dev.md
gcroci2 Dec 21, 2023
341b1d0
merge with main
gcroci2 Dec 21, 2023
1b8f1f0
update integration tests with the new query api
gcroci2 Dec 21, 2023
2e4ec65
remove train parameter from dataset.py
gcroci2 Jan 3, 2024
d1da845
remove train refs from trainer.py
gcroci2 Jan 3, 2024
0bb3023
update tests with the new train_data logic
gcroci2 Jan 3, 2024
0255dcf
update docs
gcroci2 Jan 3, 2024
f4ba712
update tutorials
gcroci2 Jan 3, 2024
bff8a3d
change train_data to train_source
gcroci2 Jan 3, 2024
8c3c5d1
add comment for clarifying tests
gcroci2 Jan 3, 2024
3d31c59
merge with dev
gcroci2 Jan 3, 2024
7f33a68
fix integration test
gcroci2 Jan 3, 2024
226ff35
Merge pull request #515 from DeepRank/510_testing_pre_trained_gcroci2
gcroci2 Jan 3, 2024
4fb9f8e
add conda installation for dssp in the yml file
gcroci2 Jan 15, 2024
f9f93bc
remove the not-conda dssp installation
gcroci2 Jan 15, 2024
e106c88
add dssp conda installation to the action.yml file
gcroci2 Jan 15, 2024
fc57c94
try to fix dssp conda installation
gcroci2 Jan 15, 2024
4f0a478
add libgcc-ng to the conda installation
gcroci2 Jan 15, 2024
1b5abc4
retry dssp installation fix
gcroci2 Jan 15, 2024
93e7a5a
try to print out warning for failing tests
gcroci2 Jan 18, 2024
9deabe3
install dependencies via the yml file
gcroci2 Jan 18, 2024
91775b6
try to fix conda activate
gcroci2 Jan 18, 2024
0faf301
try to fix conda env installation
gcroci2 Jan 18, 2024
1bc12fc
use conda-incubator action for miniconda
gcroci2 Jan 18, 2024
941823b
update installation on CI
gcroci2 Jan 18, 2024
9968cfb
try again with conda-incubator action
gcroci2 Jan 19, 2024
108234a
try to fix error
gcroci2 Jan 19, 2024
d7f05c5
print info about conda env
gcroci2 Jan 19, 2024
5f902ca
go back to original action but with dssp installed via conda
gcroci2 Jan 19, 2024
7ecbe7d
add pip dep again to the toml
gcroci2 Jan 19, 2024
5b8dd7a
add conda env list
gcroci2 Jan 19, 2024
08a2213
try the installation using the yml file
gcroci2 Jan 19, 2024
f6ee4f5
try to fix env update
gcroci2 Jan 19, 2024
b777787
add h5py to yml
gcroci2 Jan 19, 2024
141b981
add missing deps to the yml file
gcroci2 Jan 19, 2024
97ac8f5
fix packages errors
gcroci2 Jan 19, 2024
c551e47
fix markov-clustering installation
gcroci2 Jan 19, 2024
c1c0c63
remove env name
gcroci2 Jan 19, 2024
144d817
put dependencies back to the toml
gcroci2 Jan 19, 2024
4efc2ba
remove pdb2sql from requirements.txt
gcroci2 Jan 19, 2024
7bc996d
re-add python to yml file
gcroci2 Jan 19, 2024
36796c8
readd deeprank2 to requirements.txt
gcroci2 Jan 19, 2024
986b6dd
readd macos deps installation
gcroci2 Jan 19, 2024
c41f132
remove dssp from yml - the dockerfile does not need to be edited at t…
gcroci2 Jan 19, 2024
d8336aa
update docker file with dssp via conda
gcroci2 Jan 19, 2024
de7ad35
update docs for new dssp installation
gcroci2 Jan 19, 2024
edf44c4
merge with dev
gcroci2 Jan 22, 2024
3a7d69b
add init file to the tests folder
gcroci2 Jan 22, 2024
a9ce360
add cov fail under 80
gcroci2 Jan 22, 2024
b5767f6
ci: set up ruff linter
DaniBodor Jan 16, 2024
01ccd90
docs: reference new linter in documentation
DaniBodor Jan 16, 2024
c56854e
style: VS code automatic linting/formatting
DaniBodor Jan 16, 2024
bf66cac
style: implement new linter/formatter throughout code base
DaniBodor Jan 16, 2024
c7bbad2
style: check commented out code
DaniBodor Jan 16, 2024
054c7d3
test: suppress some warnings
DaniBodor Jan 16, 2024
7dc2526
style: format non-python files
DaniBodor Jan 16, 2024
fb46161
Merge pull request #554 from DeepRank/541_ruff_dbodor
DaniBodor Jan 22, 2024
db6a6a8
merge with dev
gcroci2 Jan 22, 2024
eb6acf8
formatting
DaniBodor Jan 23, 2024
486a8a1
suggestions to README
DaniBodor Jan 23, 2024
28b9735
fix typos and edit subtitles
gcroci2 Jan 24, 2024
7a4ce85
add toc to docs/installation.md
gcroci2 Jan 24, 2024
08a4013
remove pyhon 3.11 from the CI
gcroci2 Jan 24, 2024
0b63729
add extra input for testing release in the action.yml
gcroci2 Jan 24, 2024
4521438
remove line to run the ci
gcroci2 Jan 24, 2024
ac12feb
Update docs/installation.md
gcroci2 Jan 24, 2024
f1519d2
Update docs/installation.md
gcroci2 Jan 24, 2024
8718308
Merge pull request #549 from DeepRank/534_conda_dssp_gcroci2
gcroci2 Jan 24, 2024
d6942f1
Merge branch 'dev' into 539_action_for_testing_release_gcroci2
gcroci2 Jan 24, 2024
386e6ff
add yml file for testing latest release
gcroci2 Jan 24, 2024
65a8b45
update build names
gcroci2 Jan 24, 2024
5f24eef
remove deeprank2 dir from env before installing latest release
gcroci2 Jan 24, 2024
f9ecec7
try to fix test-release input format
gcroci2 Jan 24, 2024
e9b4deb
try to fix if condition
gcroci2 Jan 24, 2024
bba34e3
try to fix if cond
gcroci2 Jan 24, 2024
3dc2d99
use string for if cond on test package input
gcroci2 Jan 24, 2024
0a4ae17
add pytest in latest version case
gcroci2 Jan 24, 2024
967e436
try to fix pytest installation
gcroci2 Jan 24, 2024
974cb60
make build latest release action run only when a release action is co…
gcroci2 Jan 25, 2024
469139f
rename test-release input to pkg-installation-type
gcroci2 Jan 25, 2024
bdf94d8
improve description for pkg-installation-type
gcroci2 Jan 25, 2024
cc82882
rename editable to repository
gcroci2 Jan 25, 2024
6d832e2
rename editable with github repo
gcroci2 Jan 25, 2024
6d0ca44
fix typo
gcroci2 Jan 25, 2024
67ce748
Merge pull request #560 from DeepRank/539_action_for_testing_release_…
gcroci2 Jan 25, 2024
2dae3ff
solve merge conflicts
gcroci2 Jan 25, 2024
2d5ef8b
merge dev with main
gcroci2 Jan 25, 2024
4ec3715
remove dev readme
gcroci2 Jan 25, 2024
1c9339a
bump2version
gcroci2 Jan 25, 2024
ac1703b
update date in the cff file
gcroci2 Jan 25, 2024
cc1ff13
update codacy badge
gcroci2 Jan 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 11 additions & 28 deletions .github/actions/install-python-and-package/action.yml
Original file line number Diff line number Diff line change
@@ -27,8 +27,10 @@ runs:
with:
update-conda: true
python-version: ${{ inputs.python-version }}
conda-channels: anaconda
- run: conda --version
conda-channels: pytorch, pyg, bioconda, defaults, sbl, conda-forge
- run: |
conda --version
conda env list
shell: bash {0}
- name: Python info
shell: bash -e {0}
@@ -41,16 +43,16 @@ runs:
CMAKE_INSTALL_PREFIX: .local
if: runner.os == 'Linux'
run: |
# Install dependencies not handled by setuptools
# Install deeprank2 conda dependencies
## DSSP
sudo apt-get install -y dssp
conda install -c sbl dssp>=4.2.2.1
## MSMS
conda install -c bioconda msms
conda install -c bioconda msms>=2.6.1
## PyTorch, PyG, PyG adds
### Installing for CPU only on the CI
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 -c pytorch
pip install torch_geometric==2.3.1
pip install torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-$(python3 -c "import torch; print(torch.__version__)")+cpu.html
conda install pytorch=2.1.1 torchvision=0.16.1 torchaudio=2.1.1 cpuonly=2.0.* -c pytorch
conda install pyg=2.4.0 -c pyg
pip install torch_scatter==2.1.2 torch_sparse==0.6.18 torch_cluster==1.6.3 torch_spline_conv==1.2.2 -f https://data.pyg.org/whl/torch-2.1.0+cpu.html
- name: Install dependencies on MacOS
shell: bash {0}
env:
@@ -59,26 +61,7 @@ runs:
run: |
# Install dependencies not handled by setuptools
## DSSP
git clone https://github.com/PDB-REDO/libcifpp.git --recurse-submodules
cd libcifpp
cmake -S . -B build -DCMAKE_INSTALL_PREFIX=$HOME/.local -DCMAKE_BUILD_TYPE=Release
cmake --build build
cmake --install build
#######
git clone https://github.com/mhekkel/libmcfp.git
cd libmcfp
mkdir build
cd build
cmake ..
cmake --build .
cmake --install .
#######
git clone https://github.com/PDB-REDO/dssp.git
cd dssp
mkdir build
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build
cmake --install build
conda install -c sbl dssp>=4.2.2.1
## MSMS
cd /tmp/
wget http://mgltools.scripps.edu/downloads/tars/releases/MSMSRELEASE/REL2.6.1/msms_i86Linux2_2.6.1.tar.gz
2 changes: 1 addition & 1 deletion .github/workflows/build.yml
Original file line number Diff line number Diff line change
@@ -37,7 +37,7 @@ jobs:
fail-fast: false
matrix:
os: ["ubuntu-latest"]
python-version: ["3.10", "3.11"]
python-version: ["3.10"] # ["3.10", "3.11"]

steps:
- uses: actions/checkout@v3
2 changes: 1 addition & 1 deletion .github/workflows/coveralls.yml
Original file line number Diff line number Diff line change
@@ -46,7 +46,7 @@ jobs:
python-version: ${{ matrix.python-version }}
extras-require: test
- name: Run unit tests with coverage
run: pytest --cov --cov-append --cov-report xml --cov-report term --cov-report html
run: pytest --cov --cov-append --cov-report xml --cov-fail-under=80 --cov-report term --cov-report html
- name: Coveralls
env:
GITHUB_TOKEN: ${{ secrets.github_token }}
24 changes: 10 additions & 14 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -2,31 +2,27 @@
FROM --platform=linux/x86_64 condaforge/miniforge3:23.3.1-1

# Add files
ADD ./tutorials /home/deeprank2/tutorials
ADD ./tutorials /home/deeprank2/tutorials
ADD ./env/environment.yml /home/deeprank2
ADD ./env/requirements.txt /home/deeprank2

# Install
RUN \
apt update -y && \
apt install unzip -y && \
apt update -y &&
apt install unzip -y &&
## GCC
apt install -y gcc && \
## DSSP
wget https://github.com/PDB-REDO/dssp/releases/download/v4.4.0/mkdssp-4.4.0-linux-x64 && \
mv mkdssp-4.4.0-linux-x64 /usr/local/bin/mkdssp && \
chmod a+x /usr/local/bin/mkdssp && \
apt install -y gcc &&
## Conda and pip deps
mamba env create -f /home/deeprank2/environment.yml && \
mamba env create -f /home/deeprank2/environment.yml &&
## Get the data for running the tutorials
if [ -d "/home/deeprank2/tutorials/data_raw" ]; then rm -Rf /home/deeprank2/tutorials/data_raw; fi && \
if [ -d "/home/deeprank2/tutorials/data_processed" ]; then rm -Rf /home/deeprank2/tutorials/data_processed; fi && \
wget https://zenodo.org/records/8349335/files/data_raw.zip && \
unzip data_raw.zip -d data_raw && \
if [ -d "/home/deeprank2/tutorials/data_raw" ]; then rm -Rf /home/deeprank2/tutorials/data_raw; fi &&
if [ -d "/home/deeprank2/tutorials/data_processed" ]; then rm -Rf /home/deeprank2/tutorials/data_processed; fi &&
wget https://zenodo.org/records/8349335/files/data_raw.zip &&
unzip data_raw.zip -d data_raw &&
mv data_raw /home/deeprank2/tutorials

# Activate the environment
RUN echo "source activate deeprank2" > ~/.bashrc
RUN echo "source activate deeprank2" >~/.bashrc
ENV PATH /opt/conda/envs/deeprank2/bin:$PATH

# Define working directory
137 changes: 80 additions & 57 deletions README.md

Large diffs are not rendered by default.

53 changes: 53 additions & 0 deletions docs/features.md
Original file line number Diff line number Diff line change
@@ -22,6 +22,59 @@ def add_features(
pass
```

Additionally, the nomenclature of the custom feature should be added in `deeprank2.domain.edgestorage` or `deeprank2.domain.nodestorage`, depending on which type of feature it is.

As an example, this is the implementation of the node feature `res_type`, which represents the one-hot encoding of the amino acid residue and is defined in `deeprank2.features.components` module:

```python
from deeprank2.domain import nodestorage as Nfeat
from deeprank2.molstruct.atom import Atom
from deeprank2.molstruct.residue import Residue, SingleResidueVariant
from deeprank2.utils.graph import Graph

def add_features(
pdb_path: str, graph: Graph,
single_amino_acid_variant: Optional[SingleResidueVariant] = None
):

for node in graph.nodes:
if isinstance(node.id, Residue):
residue = node.id
elif isinstance(node.id, Atom):
atom = node.id
residue = atom.residue
else:
raise TypeError(f"Unexpected node type: {type(node.id)}")

node.features[Nfeat.RESTYPE] = residue.amino_acid.onehot
```

`RESTYPE` is the name of the variable assigned to the feature `res_type` in `deeprank2.domain.nodestorage`. In order to use the feature from DeepRank2 API, its module needs to be imported and specified during the queries processing:

```python
from deeprank2.features import components

feature_modules = [components]

# Save data into 3D-graphs only
hdf5_paths = queries.process(
"<output_folder>/<prefix_for_outputs>",
feature_modules = feature_modules)
```

Then, the feature `res_type` can be used from the DeepRank2 datasets API:

```python
from deeprank2.dataset import GraphDataset

node_features = ["res_type"]

dataset = GraphDataset(
hdf5_path = hdf5_paths,
node_features = node_features
)
```

The following is a brief description of the features already implemented in the code-base, for each features' module.

## Default node features
105 changes: 55 additions & 50 deletions docs/installation.md
Original file line number Diff line number Diff line change
@@ -1,53 +1,62 @@
# Installations
# Table of contents

The package officially supports ubuntu-latest OS only, whose functioning is widely tested through the continuous integration workflows.
- [Table of contents](#table-of-contents)
- [Installation](#installation)
- [Containerized Installation](#containerized-installation)
- [Local/remote installation](#localremote-installation)
- [YML file installation](#yml-file-installation)
- [Manual installation](#manual-installation)
- [Testing DeepRank2 installation](#testing-deeprank2-installation)
- [Contributing](#contributing)

You can either install DeepRank2 in a [dockerized container](#containerized-installation), which will allow you to run our [tutorial notebooks](https://github.com/DeepRank/deeprank2/tree/main/tutorials), or you can [install the package locally](#localremote-installation).
# Installation

## Containerized Installation
There are two ways to install DeepRank2:

In order to try out the package without worrying about your OS and without the need of installing all the required dependencies, we created a `Dockerfile` that can be used for taking care of everything in a suitable container. After having cloned the repository and installed [Docker](https://docs.docker.com/engine/install/), run the following commands (you may need to have sudo permission) from the root of the repository.
1. In a [dockerized container](#containerized-installation). This allows you to use DeepRank2, including all the notebooks within the container (a protected virtual space), without worrying about your operating system or installation of dependencies.
- We recommend this installation for inexperienced users and to learn to use or test our software, e.g. using the provided [tutorials](tutorials/TUTORIAL.md). However, resources might be limited in this installation and we would not recommend using it for large datasets or on high-performance computing facilities.
2. [Local installation](#localremote-installation) on your system. This allows you to use the full potential of DeepRank2, but requires a few additional steps during installation.
- We recommend this installation for more experienced users, for larger projects, and for (potential) [contributors](#contributing) to the codebase.

Build the Docker image:
## Containerized Installation

```bash
docker build -t deeprank2 .
```
In order to try out the package without worrying about your OS and without the need of installing all the required dependencies, we created a `Dockerfile` that can be used for taking care of everything in a suitable container.

Run the Docker container:
For this, you first need to install [Docker](https://docs.docker.com/engine/install/) on your system. Then run the following commands. You may need to have sudo permission for some steps, in which case the commands below can be preceded by `sudo`:

```bash
# Clone the DeepRank2 repository and enter its root directory
git clone https://github.com/DeepRank/deeprank2
cd deeprank2

# Build and run the Docker image
docker build -t deeprank2 .
docker run -p 8888:8888 deeprank2
```

This assumes that your application inside the container is listening on port 8888, and you want to map it to port 8888 on your host machine. Open a browser and go to `http://localhost:8888` to access the application running inside the Docker container and run the tutorials' notebooks.
Next, open a browser and go to `http://localhost:8888` to access the application running inside the Docker container. From there you can use DeepRank2, e.g. to run the tutorial notebooks.

More details about the tutorials' content can be found [here](https://github.com/DeepRank/deeprank2/blob/main/tutorials/TUTORIAL.md). Note that in the docker container only the raw PDB files are downloaded, needed as a starting point for the tutorials. You can obtain the processed HDF5 files by running the `data_generation_xxx.ipynb` notebooks. Because Docker containers are limited in memory resources, we limit the number of data points processed in the tutorials'. Please install the package locally to fully leverage its capabilities.
More details about the tutorials' contents can be found [here](https://github.com/DeepRank/deeprank2/blob/main/tutorials/TUTORIAL.md). Note that in the docker container only the raw PDB files are downloaded, which needed as a starting point for the tutorials. You can obtain the processed HDF5 files by running the `data_generation_xxx.ipynb` notebooks. Because Docker containers are limited in memory resources, we limit the number of data points processed in the tutorials. Please [install the package locally](#localremote-installation) to fully leverage its capabilities.

After running the tutorials, you may want to remove the (quite large) Docker image from your machine. In this case, remember to [stop the container](https://docs.docker.com/engine/reference/commandline/stop/) and then [remove the image](https://docs.docker.com/engine/reference/commandline/image_rm/). More general information about Docker can be found on the [official website docs](https://docs.docker.com/get-started/).
If after running the tutorials you want to remove the (quite large) Docker image from your machine, you must first [stop the container](https://docs.docker.com/engine/reference/commandline/stop/) and can then [remove the image](https://docs.docker.com/engine/reference/commandline/image_rm/). More general information about Docker can be found on the [official website docs](https://docs.docker.com/get-started/).

## Local/remote installation

### Non-pythonic dependencies

Instructions are up to date as of 27 Nov 2023.

Before installing deeprank2 you need to install some dependencies:

* [DSSP 4](https://swift.cmbi.umcn.nl/gv/dssp/)
* Check if `dssp` is installed: `dssp --version`. If this gives an error or shows a version lower than 4:
* on ubuntu 22.04 or newer: `sudo apt-get install dssp`. If the package cannot be located, first run `sudo apt-get update`.
* on older versions of ubuntu or on mac or lacking sudo priviliges: install from [here](https://github.com/pdb-redo/dssp), following the instructions listed. Alternatively, follow [this](https://github.com/PDB-REDO/libcifpp/issues/49) thread.
* [GCC](https://gcc.gnu.org/install/)
* Check if gcc is installed: `gcc --version`. If this gives an error, run `sudo apt-get install gcc`.
Local installation is formally only supported on the latest stable release of ubuntu, for which widespread automated testing through continuous integration workflows has been set up. However, it is likely that the package runs smoothly on other operating systems as well.

### Pythonic dependencies
Before installing DeepRank2 please ensure you have [GCC](https://gcc.gnu.org/install/) installed: if running `gcc --version` gives an error, run `sudo apt-get install gcc`.

Instructions are up to date as of 27 Nov 2023.
#### YML file installation

Then, you can use the YML file we provide for creating a [conda environment](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) containing the latest stable release of the package and all the other necessary conda and pip dependencies (CPU only, Python 3.10):
You can use the provided YML file for creating a [conda environment](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) containing the latest stable release of DeepRank2 and all its dependencies.
This will install the CPU-only version of DeepRank2 on Python 3.10.
Note that this will not work for MacOS. Do the [Manual Installation](#manual-installation) instead.

```bash
# Clone the DeepRank2 repository and enter its root directory
git clone https://github.com/DeepRank/deeprank2
cd deeprank2

# Ensure you are in your base environment
conda activate
# Create the environment
@@ -56,26 +65,24 @@ conda env create -f env/environment.yml
conda activate deeprank2
```

Alternatively, if you are a MacOS user, if the YML file installation is not successfull, or if you want to use CUDA or Python 3.11, you can install each dependency separately, and then the latest stable release of the package using the PyPi package manager. Also in this case, we advise to use a [conda environment](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html). In case of issues during installation, please refer to the official documentation for each package (linked below), as our instructions may be out of date:
See instructions below to [test](#testing-deeprank2-installation) that the installation was succesful.

### Manual installation

If you want to use the GPUs, choose a specific python version, are a MacOS user, or if the YML installation was not succesful, you can install the package manually. We advise to do this inside a [conda virtual environment](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html).
If you have any issues during installation of dependencies, please refer to the official documentation for each package (linked below), as our instructions may be out of date (last tested on 19 Jan 2024):

- [MSMS](https://anaconda.org/bioconda/msms): `conda install -c bioconda msms`.
- [DSSP 4](https://anaconda.org/sbl/dssp): `conda install -c sbl dssp`
- [MSMS](https://anaconda.org/bioconda/msms): `conda install -c bioconda msms`
- [Here](https://ssbio.readthedocs.io/en/latest/instructions/msms.html) for MacOS with M1 chip users.
- [PyTorch](https://pytorch.org/get-started/locally/)
- [PyTorch](https://pytorch.org/get-started/locally/): `conda install pytorch torchvision torchaudio cpuonly -c pytorch`
- Pytorch regularly publishes updates and not all newest versions will work stably with DeepRank2. Currently, the package is tested using [PyTorch 2.1.1](https://pytorch.org/get-started/previous-versions/#v211).
- We support torch's CPU library as well as CUDA.
- [PyG](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html) and its optional dependencies: `torch_scatter`, `torch_sparse`, `torch_cluster`, `torch_spline_conv`.
- [DSSP 4](https://swift.cmbi.umcn.nl/gv/dssp/)
- Check if `dssp` is installed: `dssp --version`. If this gives an error or shows a version lower than 4:
- on ubuntu 22.04 or newer: `sudo apt-get install dssp`. If the package cannot be located, first run `sudo apt-get update`.
- on older versions of ubuntu or on mac or lacking sudo priviliges: install from [here](https://github.com/pdb-redo/dssp), following the instructions listed.
- [GCC](https://gcc.gnu.org/install/)
- Check if gcc is installed: `gcc --version`. If this gives an error, run `sudo apt-get install gcc`.
- For MacOS with M1 chip users only install [the conda version of PyTables](https://www.pytables.org/usersguide/installation.html).
- The exact command to install pyg will depend on the version of pytorch you are using. Please refer to the source's installation instructions (we recommend using the pip installation for this as it also shows the command for the dependencies).
- For MacOS with M1 chip users: install [the conda version of PyTables](https://www.pytables.org/usersguide/installation.html).

Finally do:

```bash
pip install deeprank2
```
Finally install deeprank2 itself: `pip install deeprank2`.

Alternatively, get the latest updates by cloning the repo and installing the editable version of the package with:

@@ -85,18 +92,16 @@ cd deeprank2
pip install -e .'[test]'
```

The `test` extra is optional, and can be used to install test-related dependencies useful during the development.
The `test` extra is optional, and can be used to install test-related dependencies, useful during development.

### Test installation
### Testing DeepRank2 installation

You can check that all components were installed correctly, using pytest. We especially recommend doing this in case you installed DeepRank2 and its dependencies manually (the latter option above).

If you have installed the package from a cloned repository (the latter option above), you can check that all components were installed correctly, using pytest (run `pip install pytest` if you did not install it above).
The quick test should be sufficient to ensure that the software works, while the full test (a few minutes) will cover a much broader range of settings to ensure everything is correct.

Run `pytest tests/test_integration.py` for the quick test or just `pytest` for the full test (expect a few minutes to run).
First run `pip install pytest`, if you did not install it above. Then run `pytest tests/test_integration.py` for the quick test or just `pytest` for the full test (expect a few minutes to run).

# Contributing

If you would like to contribute to the package in any way, please see [our guidelines](CONTRIBUTING.rst).

The following section serves as a first guide to start using the package, using protein-protein Interface (PPI) queries as example. For an enhanced learning experience, we provide in-depth [tutorial notebooks](https://github.com/DeepRank/deeprank2/tree/main/tutorials) for generating PPI data, generating SVR data, and for the training pipeline.
For more details, see the [extended documentation](https://deeprank2.rtfd.io/).
5 changes: 4 additions & 1 deletion env/environment.yml
Original file line number Diff line number Diff line change
@@ -4,10 +4,13 @@ channels:
- pyg
- bioconda
- defaults
- conda-forge
- sbl
dependencies:
- pip==23.3.*
- python==3.10.*
- msms==2.6.1
- dssp>=4.2.2.1
- pytorch==2.1.1
- pytorch-mutex==1.0.*
- torchvision==0.16.1
@@ -16,4 +19,4 @@ dependencies:
- pyg==2.4.0
- notebook==7.0.6
- pip:
- --requirement requirements.txt
- --requirement requirements.txt
18 changes: 12 additions & 6 deletions tutorials/data_generation_ppi.ipynb
Original file line number Diff line number Diff line change
@@ -145,7 +145,7 @@
"pdb_files, bas = get_pdb_files_and_target_data(data_path)\n",
"\n",
"if limit_data:\n",
"\tpdb_files = pdb_files[:15]"
" pdb_files = pdb_files[:15]"
]
},
{
@@ -213,7 +213,7 @@
" if count % 20 == 0:\n",
" print(f\"{count} queries added to the collection.\")\n",
"\n",
"print(f\"Queries ready to be processed.\\n\")"
"print(\"Queries ready to be processed.\\n\")"
]
},
{
@@ -255,7 +255,9 @@
" grid_map_method=grid_map_method,\n",
")\n",
"\n",
"print(f'The queries processing is done. The generated HDF5 files are in {os.path.join(processed_data_path, \"residue\")}.')"
"print(\n",
" f'The queries processing is done. The generated HDF5 files are in {os.path.join(processed_data_path, \"residue\")}.'\n",
")"
]
},
{
@@ -358,7 +360,9 @@
"metadata": {},
"outputs": [],
"source": [
"fname = os.path.join(processed_data_path, \"residue\", \"_\".join([\"res_mass\", \"distance\", \"electrostatic\"]))\n",
"fname = os.path.join(\n",
" processed_data_path, \"residue\", \"_\".join([\"res_mass\", \"distance\", \"electrostatic\"])\n",
")\n",
"dataset.save_hist(features=[\"res_mass\", \"distance\", \"electrostatic\"], fname=fname)\n",
"\n",
"im = img.imread(fname + \".png\")\n",
@@ -450,7 +454,7 @@
" if count % 20 == 0:\n",
" print(f\"{count} queries added to the collection.\")\n",
"\n",
"print(f\"Queries ready to be processed.\\n\")"
"print(\"Queries ready to be processed.\\n\")"
]
},
{
@@ -476,7 +480,9 @@
" grid_map_method=grid_map_method,\n",
")\n",
"\n",
"print(f'The queries processing is done. The generated HDF5 files are in {os.path.join(processed_data_path, \"atomic\")}.')"
"print(\n",
" f'The queries processing is done. The generated HDF5 files are in {os.path.join(processed_data_path, \"atomic\")}.'\n",
")"
]
},
{
16 changes: 11 additions & 5 deletions tutorials/data_generation_srv.ipynb
Original file line number Diff line number Diff line change
@@ -151,7 +151,7 @@
"pdb_files, res_numbers, res_wildtypes, res_variants, targets = get_pdb_files_and_target_data(data_path)\n",
"\n",
"if limit_data:\n",
"\tpdb_files = pdb_files[:15]"
" pdb_files = pdb_files[:15]"
]
},
{
@@ -266,7 +266,9 @@
" grid_map_method=grid_map_method,\n",
")\n",
"\n",
"print(f'The queries processing is done. The generated HDF5 files are in {os.path.join(processed_data_path, \"residue\")}.')"
"print(\n",
" f'The queries processing is done. The generated HDF5 files are in {os.path.join(processed_data_path, \"residue\")}.'\n",
")"
]
},
{
@@ -376,7 +378,9 @@
"metadata": {},
"outputs": [],
"source": [
"fname = os.path.join(processed_data_path, \"residue\", \"_\".join([\"res_mass\", \"distance\", \"electrostatic\"]))\n",
"fname = os.path.join(\n",
" processed_data_path, \"residue\", \"_\".join([\"res_mass\", \"distance\", \"electrostatic\"])\n",
")\n",
"dataset.save_hist(features=[\"res_mass\", \"distance\", \"electrostatic\"], fname=fname)\n",
"\n",
"im = img.imread(fname + \".png\")\n",
@@ -470,7 +474,7 @@
" if count % 20 == 0:\n",
" print(f\"{count} queries added to the collection.\")\n",
"\n",
"print(f\"Queries ready to be processed.\\n\")"
"print(\"Queries ready to be processed.\\n\")"
]
},
{
@@ -496,7 +500,9 @@
" grid_map_method=grid_map_method,\n",
")\n",
"\n",
"print(f'The queries processing is done. The generated HDF5 files are in {os.path.join(processed_data_path, \"atomic\")}.')"
"print(\n",
" f'The queries processing is done. The generated HDF5 files are in {os.path.join(processed_data_path, \"atomic\")}.'\n",
")"
]
},
{