Skip to content

Commit

Permalink
Add compound-gene benchmarks (#82)
Browse files Browse the repository at this point in the history
* add missing dependencies

* fix test.yaml

* let tox handle coverage

* add rxrx3 benchmarking notebook and reference data

* clean up

* Implement alternative compound-gene benchmarking  (#81)

* slow v1

* sampling v2

* fixes

* flexibility, v3

* fix

* tests

* move tests properly

* fix baseline

* add quantiles

* update tests

* newline

* format

* fix mypy errors

* fix pytests

* re-fix format

* write docstrings

---------

Co-authored-by: John Urbanik <[email protected]>
  • Loading branch information
fedecomitani and johnurbanik authored Nov 11, 2024
1 parent 170176a commit 830569d
Show file tree
Hide file tree
Showing 12 changed files with 4,033 additions and 107 deletions.
13 changes: 0 additions & 13 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,16 +25,3 @@ jobs:
with:
name: coverage.${{ matrix.os }}.py3${{ matrix.minor_version }}
path: ./.coverage.py3${{ matrix.minor_version }}
coverage:
needs: tests
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/[email protected]
with:
path: .
- run: pip install coverage
- run: |
rm -f .coverage
coverage combine coverage*/
coverage report --fail-under=0
14 changes: 14 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ pip install efaar_benchmarking

First, run `notebooks/map_building_benchmarking.ipynb` for GWPS, cpg0016, and cpg0021 individually. This process will build each of these maps and report the perturbation signal and biological relationship benchmarks. Afterwards, run `notebooks/map_evaluation_comparison.ipynb` to explore the constructed maps using the methods presented in our paper. In order for the latter notebook to work, make sure to set the `save_results` parameter to True in the former notebook.

`notebooks/rxrx3_benchmarking.ipynb` contains an example of extracting gene-gene relationships recall and compound-gene average precision scores for the public Rxrx3 dataset.

We've uploaded the 128-dimensional PCA-TVN maps we constructed for GWPS, cpg0016, and cpg0021 to the `notebooks/data` directory. So, for convenience, one can run `notebooks/map_evaluation_comparison.ipynb` directly on these uploaded map files if they wish to explore the maps further without running `notebooks/map_building_benchmarking.ipynb`. See `notebooks/data/LICENSE` for terms of use for each dataset.

RxRx3 embeddings are available as separate parquet files per plate in the embeddings.tar file, downloadable from https://rxrx3.rxrx.ai/downloads. Note that in this data, all but 733 genes are anonymized.
Expand Down Expand Up @@ -67,3 +69,15 @@ _Licata, L., Lo Surdo, P., Iannuccelli, M., Palma, A., Micarelli, E., Perfetto,
**StringDB:**

_von Mering C, Jensen LJ, Snel B, Hooper SD, Krupp M, Foglierini M, Jouffre N, Huynen MA, Bork P. STRING: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D433-7. doi: 10.1093/nar/gki005._

**ChEMBL:**

_Zdrazil B, Felix E, Hunter F, Manners EJ, Blackshaw J, Corbett S, de Veij M, Ioannidis H, Lopez DM, Mosquera JF, Magarinos MP, Bosc N, Arcila R, Kizilören T, Gaulton A, Bento AP, Adasme MF, Monecke P, Landrum GA, Leach AR. The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Res. 2024 Jan 5;52(D1):D1180-D1192._

**BindingDB:**

_Gilson,M.K., Liu,T., Baitaluk,M., Nicola,G., Hwang, L. and Chong,J. BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology Nucleic Acids Research 44:D1045-D1053 (2015)._

**Guide to Pharmacology:**

_Harding SD, Armstrong JF, Faccenda E, Southan C, Alexander SPH, Davenport AP, Spedding M, Davies JA. (2023) The IUPHAR/BPS Guide to PHARMACOLOGY in 2024. Nucl. Acids Res. 2024; 52(D1):D1438-D1449. doi:10.1093/nar/gkad944. [Full text]. PMID: 37897341._
2,922 changes: 2,922 additions & 0 deletions efaar_benchmarking/benchmark_annotations/compound_gene_interactions.csv

Large diffs are not rendered by default.

Loading

0 comments on commit 830569d

Please sign in to comment.