T032: Compound activity: Proteochemometrics #278

gorostiolam · 2022-10-17T14:18:25Z

New talktorial

Details

Talktorial ID: T032
Title: Compound activity: Proteochemometrics
Original authors: Marina Gorostiola González, Olivier J.M. Béquignon, Willem Jespers, 2022
ReviewNB: https://app.reviewnb.com/volkamerlab/teachopencadd/pull/278/

Content review

Potential labels or categories (e.g. machine learning, small molecules, online APIs): machine learning, proteochemometrics
One line summary: In this talktorial, we use Proteochemometrics modelling (PCM) to enrich our activity models with protein data to predict the activity of novel compounds against the four adenosine receptor isoforms (A1, A2A, A2B, A3).
I have used the talktorial template and followed the formatting suggested there
The table of contents reflects the talktorial story-line; order of #, ##, ### headers is correct
URLs are linked with meaningful words, instead of pasting the URL directly or linking words like here.
I have spell-checked the notebook
Images have enough resolution to be rendered with quality, without being too heavy.
All figures have a description
Markdown cell content is still in-line with code cell output (whenever results are discussed)
I have checked that cell outputs are not incredibly long (this applies also to DataFrames)
Formatting looks correctly on the Sphinx render (bold, italics, figure placing)

Code review

Time it took to execute (approx.): 6 min
Variable and function names follow snake case rules (e.g. a_variable_name vs aVariableName)
Spacing follows PEP8 (run Black on the code cells if needed)
Code line are under 99 characters each (run black -l 99)
Comments are useful and well placed
There are no unpythonic idioms like for i in range(len(list)) (see slides)
All 3rd party dependencies are listed at the top of the notebook
I have marked all code cell with output referenced in markdown cells with the label # NBVAL_CHECK_OUTPUT
All import ... lines are at the top (practice part) cell, ordered by standard library / 3rd party packages / our own (teachopencadd.*)
I have update the relative paths to absolute paths.
```
HERE = Path(_dh[-1])
DATA = HERE / "data"
```
List here unfamiliar libraries you find in the imports and their intended use:

papyrus_scripts (https://github.com/OlivierBeq/Papyrus-scripts/tarball/master): scripts to work with Papyrus dataset
prodec (pip installable): protein descriptor generator
mordred (conda installable -c conda-forge): molecular descriptor generator
rich-msa (pip installable): visualization of multiple sequence alignments
REMOVED wget (conda installable -c anaconda): download ClustalO binary files (NOTE: actively working on a different way to use ClustalO to generate the MSA so an installation is not required)
xmltramp2 (pip installable; also conda installable via -c bioconda): dependency for ClustalO REST API

Questions

Currently required dependencies are listed on the t032_env.yml file. They need to be reduced to the minimum and incorporate in the main .yml environment file. Whose task is this? > @dominiquesydow will do this
ClustalO REST API requires valid email to submit queries. Is there an institutional email that we can use there? Otherwise we can also make a mock gmail account? (No emails are sent, but the email address is required)
- @dominiquesydow's fix: By default no email address (instructions for user to set email if interested in using ClustalO with their own data; otherwise using pre-calculated aligned_sequences.aln-fasta.fasta; updated gitignore accordingly

Status

Ready to go

…o 032

review-notebook-app · 2022-10-17T14:18:29Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

teachopencadd/talktorials/T032_compound_activity_proteochemometrics/talktorial.ipynb

AndreaVolkamer · 2022-10-25T20:01:42Z

@gorostiolam Great talktorial, thanks for your contribution! I reviewed the theory part for now, and only added small comments (see above). Let me know if you have any questions.
@dominiquesydow fell free to continue with the code parts.

dominiquesydow · 2022-10-25T20:33:18Z

Great, thanks a lot - I'll take over tomorrow evening!

gorostiolam · 2022-10-26T10:27:45Z

@AndreaVolkamer thanks for the comments! I have implemented them and will do the same when the code revision by @dominiquesydow is done.

dominiquesydow · 2022-11-07T22:10:11Z

Hi everyone,

This is shaping up beautifully!!

I am pushing the following updates:

By default no email address (instructions for user to set email if interested in using ClustalO with their own data; otherwise using pre-calculated aligned_sequences.aln-fasta.fasta; updated gitignore accordingly --- definitely up for discussion
Enabled website rendering, if you'd like to see the rendering for yourself, follow instructions here
Minor sorting update on imports, e.g. no paragraphs between third party imports
Uniprot > UniProt (was a mix)
Fix website rendering issues, e.g. always new paragraph before lists

Open TODOs still:

Proofread the rendered website (apparently, we cannot have HTML tags such as <b> but need Markdown **, at least I do not see any formatting)
At the end: Env file and CI set for all notebooks (currently only T032 for a faster CI check)
At the end: black-nb and README generator
@AndreaVolkamer, thanks a lot for looking into ClustalO alternatives! I do not have the bandwidth at the moment to adapt this; ClustalO seems like a commonly used service for this task, hence IMO good to showcase it here but definitely worth looking into the alternative more (maybe as a start raising this in the discussion and adapt in the future?)

gorostiolam · 2022-11-08T10:37:40Z

Hi everyone,

This is shaping up beautifully!!

I am pushing the following updates:

By default no email address (instructions for user to set email if interested in using ClustalO with their own data; otherwise using pre-calculated aligned_sequences.aln-fasta.fasta; updated gitignore accordingly --- definitely up for discussion

Enabled website rendering, if you'd like to see the rendering for yourself, follow instructions here

Minor sorting update on imports, e.g. no paragraphs between third party imports

Uniprot > UniProt (was a mix)

Fix website rendering issues, e.g. always new paragraph before lists

Open TODOs still:

Proofread the rendered website (apparently, we cannot have HTML tags such as <b> but need Markdown **, at least I do not see any formatting)

At the end: Env file and CI set for all notebooks (currently only T032 for a faster CI check)

At the end: black-nb and README generator

@AndreaVolkamer, thanks a lot for looking into ClustalO alternatives! I do not have the bandwidth at the moment to adapt this; ClustalO seems like a commonly used service for this task, hence IMO good to showcase it here but definitely worth looking into the alternative more (maybe as a start raising this in the discussion and adapt in the future?)

Hi!

These commits seem very appropriate! I agree that adding the option to read pre-computed ClustalO alignment or otherwise use your own email is probably the easiest fix here!

Regarding website rendering, I also cannot see formatting so indeed HTML tags will need to be substituted by Markdown. I have committed an update from HTML to Markdown formatting and now it renders correctly for me.

dominiquesydow · 2022-11-21T09:32:53Z

Hi @gorostiolam, my apologies for the hold-up from my side, I will get back onto this PR tonight - thank you very much for the updates on the Markdown rendering!

dominiquesydow · 2022-11-21T21:32:37Z

@AndreaVolkamer, I am trying out if installing the pip dependencies within the notebook itself works as an intermediate solution (would avoid installing many new packages for the full TOC stack).

Would look like this in the notebook:

Before we start, let's install a few packages that are not part of TeachOpenCADD's global enviroment file because they are only relevant to this notebook (this setup will change in the future, see discussion here).
!pip install prodec rich-msa xmltramp2
!pip install git+https://github.com/OlivierBeq/Papyrus-scripts.git

With a comment in our env file:

teachopencadd/devtools/test_env.yml

Lines 76 to 83 in a6cd9df

    
           # T032 
        
           # The following pip packages are currently installed in the notebook itself because they are only used there, thereby avoiding the addition of more dependencies to our already quite large environment file. 
        
           # Follow this discussion on how we try to simplify our environment setup in the future: https://github.com/volkamerlab/teachopencadd/discussions/277 
        
           # - https://github.com/OlivierBeq/Papyrus-scripts/tarball/master 
        
           # - prodec 
        
           # - rich-msa 
        
           # Dependency for ClustalO webservice (also conda installable via -c bioconda) 
        
           # - xmltramp2

Remaining TODOs:

Set env file and CI for all notebooks (currently only T032 for a faster CI check)
Check upcoming CI fails
Check Windows fail for T032 > for now no support under Windows: Note: T032 not available under Windows #318

dominiquesydow · 2023-01-02T20:37:24Z

In 8769fd5, T018 fails in this new environment for all but Ubuntu Python 3.9. Difference in environment:

# packages in passing Ubuntu Py3.9 version - 426
# packages in failing Ubuntu Py3.8 version - 425

name            python
pass_version    3.9.15
fail_version    3.8.15
Name: 306, dtype: object
name            python_abi
pass_version           3.9
fail_version           3.8
Name: 312, dtype: object
name            tzdata
pass_version     2022g
fail_version       NaN
Name: 388, dtype: object

Wait for next CI run to end - same issue?

Locally (MacOS, M1 chip) all tests pass for Python 3.9:

986 passed, 2 skipped, 56 warnings

Merging recent master in mgg-032

AndreaVolkamer · 2023-05-16T08:00:52Z

we observed this issue #358 when running the code locally, needs to be addressed, before potential merge.

AndreaVolkamer · 2024-03-05T21:25:14Z

@mbackenkoehler just wondering if we could include this now, given that we are updating the environments anyways?

gorostiolam added 10 commits October 7, 2022 15:52

Start branch

5514df0

Start T029 talktorial on proteochemometrics (PCM)

4d0136c

Add environment file

4267bcc

Completed Theory draft

d3efd75

Added first practical part to download, read and filter Papyrus dataset

af8bdb0

Added figure with splitting methods

11ff43f

Add section to download clustalO (Win,Unix,Mac) and perform MSA

1d7f445

Add libraries for installing ClustalO and visualizing MSA

161a3bf

Update tutorial number to 032

a460d8e

Add PCM and QSAR training and validation and update tutorial number t…

45a60d1

…o 032

gorostiolam added 11 commits October 18, 2022 11:19

Change dependencies to use ClustalO REST API

74eee29

Update code to use ClustalO REST API instead of binary download

1c122e5

Add ClustalO REST API client

054ccfe

Add modelling interpretation and discussion. Make modelling output neat.

73a6c11

Remove contribution cell and update practical list of contents

11bee4e

Update README file with intro from talktorial

045b2de

Update data README file

284d50e

Update figures README file

a094049

Update Papyrus workflow image

59b8d68

Update Papyrus workflow image

bed66ab

Proofread grammar

1616aeb

gorostiolam requested review from AndreaVolkamer and dominiquesydow October 18, 2022 15:39

Grammar and code revision by Olivier

d5319f4

AndreaVolkamer reviewed Oct 25, 2022

View reviewed changes

Theory revision based on Andrea's review.

8b061b5

dominiquesydow added 5 commits November 7, 2022 20:32

Docs config: Set language to "en" (not None)

c84f5dd

README: Fix broken conda-forge badge

feacb89

Docs: Add T032 nblink file

56d1398

T032: Add pre-calculated alignments (ClustalO)

1717749

T032: Set email to None; more formatting

c76c555

gorostiolam added 2 commits November 8, 2022 13:37

Update HTML formatting to Markdown

e016da8

Update conflicting formatting in regression evaluation metrics

fadc2d6

dominiquesydow added 7 commits November 21, 2022 20:43

T032: Move pip installs from env file to notebook itself

778062d

Satisfy black-nb

13f7b5c

Regenerate READMEs

5d52451

T032: Rerun notebook & add NBVAL_CHECK_OUTPUT checks

3732157

T032: Fix typos

4e5da17

T032: Remove thumbnail (talktorial has no pure png outputs we can use)

a6cd9df

T032: Fix typo [skip ci]

5fd7aae

dominiquesydow added 5 commits January 2, 2023 18:51

Env: Sync env with latest master

3c0eb9d

README: Sync with master README

a4ba93f

CI: Sync with master CI

4ee706d

Merge branch 'master' into mgg-032-compound_activity_proteochemometrics

8769fd5

CI: Drop T032 under Windows

265ae70

dominiquesydow and others added 2 commits January 2, 2023 21:38

CI: Add env list after T032-specific package installations (tmp)

4bf8bbc

Merge pull request #327 from volkamerlab/master

1c99cde

Merging recent master in mgg-032

trigger CI

63ea5c6

AndreaVolkamer requested a review from mbackenkoehler March 5, 2024 21:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

T032: Compound activity: Proteochemometrics #278

T032: Compound activity: Proteochemometrics #278

gorostiolam commented Oct 17, 2022 •

edited by dominiquesydow

Loading

review-notebook-app bot commented Oct 17, 2022

AndreaVolkamer commented Oct 25, 2022

dominiquesydow commented Oct 25, 2022

gorostiolam commented Oct 26, 2022

dominiquesydow commented Nov 7, 2022 •

edited

Loading

gorostiolam commented Nov 8, 2022 •

edited

Loading

dominiquesydow commented Nov 21, 2022

dominiquesydow commented Nov 21, 2022 •

edited

Loading

dominiquesydow commented Jan 2, 2023 •

edited

Loading

AndreaVolkamer commented May 16, 2023

AndreaVolkamer commented Mar 5, 2024

T032: Compound activity: Proteochemometrics #278

Are you sure you want to change the base?

T032: Compound activity: Proteochemometrics #278

Conversation

gorostiolam commented Oct 17, 2022 • edited by dominiquesydow Loading

New talktorial

Details

Content review

Code review

Questions

Status

review-notebook-app bot commented Oct 17, 2022

AndreaVolkamer commented Oct 25, 2022

dominiquesydow commented Oct 25, 2022

gorostiolam commented Oct 26, 2022

dominiquesydow commented Nov 7, 2022 • edited Loading

gorostiolam commented Nov 8, 2022 • edited Loading

dominiquesydow commented Nov 21, 2022

dominiquesydow commented Nov 21, 2022 • edited Loading

dominiquesydow commented Jan 2, 2023 • edited Loading

AndreaVolkamer commented May 16, 2023

AndreaVolkamer commented Mar 5, 2024

gorostiolam commented Oct 17, 2022 •

edited by dominiquesydow

Loading

dominiquesydow commented Nov 7, 2022 •

edited

Loading

gorostiolam commented Nov 8, 2022 •

edited

Loading

dominiquesydow commented Nov 21, 2022 •

edited

Loading

dominiquesydow commented Jan 2, 2023 •

edited

Loading