-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Small performance improvement to process_item
#395
Conversation
…improvements to process
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but needs changes because of imports. bionemo-testing
isn't a main, run-time dependency for bionemo-geneformer
. It's a test-time only. To unblock this PR, the load
function needs to be moved into bionemo-core
and bionemo-testing
needs to be refactored to import this so that we don't break backwards compatability.
sub-packages/bionemo-geneformer/src/bionemo/geneformer/data/singlecell/dataset.py
Outdated
Show resolved
Hide resolved
sub-packages/bionemo-geneformer/src/bionemo/geneformer/data/singlecell/dataset.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Can eventually move this profiling to unittest.
/build-ci |
/build-ci |
sub-packages/bionemo-geneformer/src/bionemo/geneformer/data/singlecell/dataset.py
Outdated
Show resolved
Hide resolved
…nglecell/dataset.py Co-authored-by: Malcolm Greaves <[email protected]> Signed-off-by: John St. John <[email protected]>
/build-ci |
…hn/process_dataset_perf
/build-ci |
Summary
Run
python -m bionemo.geneformer.data.singlecell.dataset
Baseline:
Processed 31208 rows in 47.10491418838501 seconds
Processed 31208 rows in 47.388004779815674 seconds
After vectorization:
Processed 31208 rows in 44.2215359210968 seconds
Processed 31208 rows in 44.54389500617981 seconds