Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

T032: Compound activity: Proteochemometrics #278

Open
wants to merge 66 commits into
base: master
Choose a base branch
from

Conversation

gorostiolam
Copy link
Collaborator

@gorostiolam gorostiolam commented Oct 17, 2022

New talktorial

Details

Content review

  • Potential labels or categories (e.g. machine learning, small molecules, online APIs): machine learning, proteochemometrics
  • One line summary: In this talktorial, we use Proteochemometrics modelling (PCM) to enrich our activity models with protein data to predict the activity of novel compounds against the four adenosine receptor isoforms (A1, A2A, A2B, A3).
  • I have used the talktorial template and followed the formatting suggested there
  • The table of contents reflects the talktorial story-line; order of #, ##, ### headers is correct
  • URLs are linked with meaningful words, instead of pasting the URL directly or linking words like here.
  • I have spell-checked the notebook
  • Images have enough resolution to be rendered with quality, without being too heavy.
  • All figures have a description
  • Markdown cell content is still in-line with code cell output (whenever results are discussed)
  • I have checked that cell outputs are not incredibly long (this applies also to DataFrames)
  • Formatting looks correctly on the Sphinx render (bold, italics, figure placing)

Code review

  • Time it took to execute (approx.): 6 min
  • Variable and function names follow snake case rules (e.g. a_variable_name vs aVariableName)
  • Spacing follows PEP8 (run Black on the code cells if needed)
  • Code line are under 99 characters each (run black -l 99)
  • Comments are useful and well placed
  • There are no unpythonic idioms like for i in range(len(list)) (see slides)
  • All 3rd party dependencies are listed at the top of the notebook
  • I have marked all code cell with output referenced in markdown cells with the label # NBVAL_CHECK_OUTPUT
  • All import ... lines are at the top (practice part) cell, ordered by standard library / 3rd party packages / our own (teachopencadd.*)
  • I have update the relative paths to absolute paths.
    HERE = Path(_dh[-1])
    DATA = HERE / "data"
  • List here unfamiliar libraries you find in the imports and their intended use:
  • papyrus_scripts (https://github.com/OlivierBeq/Papyrus-scripts/tarball/master): scripts to work with Papyrus dataset
  • prodec (pip installable): protein descriptor generator
  • mordred (conda installable -c conda-forge): molecular descriptor generator
  • rich-msa (pip installable): visualization of multiple sequence alignments
  • REMOVED wget (conda installable -c anaconda): download ClustalO binary files (NOTE: actively working on a different way to use ClustalO to generate the MSA so an installation is not required)
  • xmltramp2 (pip installable; also conda installable via -c bioconda): dependency for ClustalO REST API

Questions

  • Currently required dependencies are listed on the t032_env.yml file. They need to be reduced to the minimum and incorporate in the main .yml environment file. Whose task is this? > @dominiquesydow will do this
  • ClustalO REST API requires valid email to submit queries. Is there an institutional email that we can use there? Otherwise we can also make a mock gmail account? (No emails are sent, but the email address is required)
    • @dominiquesydow's fix: By default no email address (instructions for user to set email if interested in using ClustalO with their own data; otherwise using pre-calculated aligned_sequences.aln-fasta.fasta; updated gitignore accordingly

Status

  • Ready to go

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@AndreaVolkamer
Copy link
Member

@gorostiolam Great talktorial, thanks for your contribution! I reviewed the theory part for now, and only added small comments (see above). Let me know if you have any questions.
@dominiquesydow fell free to continue with the code parts.

@dominiquesydow
Copy link
Collaborator

Great, thanks a lot - I'll take over tomorrow evening!

Copy link
Collaborator Author

@AndreaVolkamer thanks for the comments! I have implemented them and will do the same when the code revision by @dominiquesydow is done.

@dominiquesydow
Copy link
Collaborator

dominiquesydow commented Nov 7, 2022

Hi everyone,

This is shaping up beautifully!!

I am pushing the following updates:

  • By default no email address (instructions for user to set email if interested in using ClustalO with their own data; otherwise using pre-calculated aligned_sequences.aln-fasta.fasta; updated gitignore accordingly --- definitely up for discussion
  • Enabled website rendering, if you'd like to see the rendering for yourself, follow instructions here
  • Minor sorting update on imports, e.g. no paragraphs between third party imports
  • Uniprot > UniProt (was a mix)
  • Fix website rendering issues, e.g. always new paragraph before lists

Open TODOs still:

  • Proofread the rendered website (apparently, we cannot have HTML tags such as <b> but need Markdown **, at least I do not see any formatting)
  • At the end: Env file and CI set for all notebooks (currently only T032 for a faster CI check)
  • At the end: black-nb and README generator
  • @AndreaVolkamer, thanks a lot for looking into ClustalO alternatives! I do not have the bandwidth at the moment to adapt this; ClustalO seems like a commonly used service for this task, hence IMO good to showcase it here but definitely worth looking into the alternative more (maybe as a start raising this in the discussion and adapt in the future?)

@gorostiolam
Copy link
Collaborator Author

gorostiolam commented Nov 8, 2022

Hi everyone,

This is shaping up beautifully!!

I am pushing the following updates:

  • By default no email address (instructions for user to set email if interested in using ClustalO with their own data; otherwise using pre-calculated aligned_sequences.aln-fasta.fasta; updated gitignore accordingly --- definitely up for discussion
  • Enabled website rendering, if you'd like to see the rendering for yourself, follow instructions here
  • Minor sorting update on imports, e.g. no paragraphs between third party imports
  • Uniprot > UniProt (was a mix)
  • Fix website rendering issues, e.g. always new paragraph before lists

Open TODOs still:

  • Proofread the rendered website (apparently, we cannot have HTML tags such as <b> but need Markdown **, at least I do not see any formatting)
  • At the end: Env file and CI set for all notebooks (currently only T032 for a faster CI check)
  • At the end: black-nb and README generator
  • @AndreaVolkamer, thanks a lot for looking into ClustalO alternatives! I do not have the bandwidth at the moment to adapt this; ClustalO seems like a commonly used service for this task, hence IMO good to showcase it here but definitely worth looking into the alternative more (maybe as a start raising this in the discussion and adapt in the future?)

Hi!

These commits seem very appropriate! I agree that adding the option to read pre-computed ClustalO alignment or otherwise use your own email is probably the easiest fix here!

Regarding website rendering, I also cannot see formatting so indeed HTML tags will need to be substituted by Markdown. I have committed an update from HTML to Markdown formatting and now it renders correctly for me.

@dominiquesydow
Copy link
Collaborator

Hi @gorostiolam, my apologies for the hold-up from my side, I will get back onto this PR tonight - thank you very much for the updates on the Markdown rendering!

@dominiquesydow
Copy link
Collaborator

dominiquesydow commented Nov 21, 2022

@AndreaVolkamer, I am trying out if installing the pip dependencies within the notebook itself works as an intermediate solution (would avoid installing many new packages for the full TOC stack).

Would look like this in the notebook:

Before we start, let's install a few packages that are not part of TeachOpenCADD's global enviroment file because they are only relevant to this notebook (this setup will change in the future, see discussion here).
!pip install prodec rich-msa xmltramp2
!pip install git+https://github.com/OlivierBeq/Papyrus-scripts.git

With a comment in our env file:

# T032
# The following pip packages are currently installed in the notebook itself because they are only used there, thereby avoiding the addition of more dependencies to our already quite large environment file.
# Follow this discussion on how we try to simplify our environment setup in the future: https://github.com/volkamerlab/teachopencadd/discussions/277
# - https://github.com/OlivierBeq/Papyrus-scripts/tarball/master
# - prodec
# - rich-msa
# Dependency for ClustalO webservice (also conda installable via -c bioconda)
# - xmltramp2

Remaining TODOs:

@dominiquesydow
Copy link
Collaborator

dominiquesydow commented Jan 2, 2023

In 8769fd5, T018 fails in this new environment for all but Ubuntu Python 3.9. Difference in environment:

# packages in passing Ubuntu Py3.9 version - 426
# packages in failing Ubuntu Py3.8 version - 425

name            python
pass_version    3.9.15
fail_version    3.8.15
Name: 306, dtype: object
name            python_abi
pass_version           3.9
fail_version           3.8
Name: 312, dtype: object
name            tzdata
pass_version     2022g
fail_version       NaN
Name: 388, dtype: object

Wait for next CI run to end - same issue?

Locally (MacOS, M1 chip) all tests pass for Python 3.9:

986 passed, 2 skipped, 56 warnings

@AndreaVolkamer
Copy link
Member

we observed this issue #358 when running the code locally, needs to be addressed, before potential merge.

@AndreaVolkamer
Copy link
Member

@mbackenkoehler just wondering if we could include this now, given that we are updating the environments anyways?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new talktorial New talktorial
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants