Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very large datasets encounter memory issues - loaded datasets do not update URL #485

Open
ShrimpCryptid opened this issue Dec 2, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@ShrimpCryptid
Copy link
Contributor

Description

A clear description of the bug

Expected Behavior

What did you expect to happen instead?

Reproduction

  1. Open https://timelapse.allencell.org
  2. Click load and paste in a dataset: https://dev-aics-dtp-001.int.allencell.org/assay-dev/users/Chantelle/colorizer/new_backdrop/exploratory_dataset/collection.json
  3. The dataset will either fail to load and crash OR load normally but the URL will not update.

image

Environment

Any additional information about your environment. E.g. OS version, python version, browser etc.

@ShrimpCryptid ShrimpCryptid added the bug Something isn't working label Dec 2, 2024
@ShrimpCryptid
Copy link
Contributor Author

ShrimpCryptid commented Dec 2, 2024

Working hypothesis is that Chantelle's dataset has a lot more features than previous datasets, which take up significantly more memory when initially loaded. This causes the out of memory error and may also be causing the strange behavior with the URL not updating.

image

I think this is what's causing it... there's an Array allocated that has 4x the elements of the other arrays but is taking up 90x the space? Note that the bounds data has 4x the elements of a feature array, but there are 91 features in the dataset...
I'm not entirely sure how to read this so I'll need to do some more digging.

Update:

  • Shallow size is the size of the object itself in memory.
  • Retained size is the size of the object + all its dependent objects.
  • Currently, JSArrayBufferData is taking up 187 MB, likely due to all of the parquet files being loaded into memory.
  • Napkin math:
    • 206,705 objects in the dataset x 91 features x size of F32 = 94,896,620 bytes
    • tracks + times + centroids + bounds + outliers is 21 add'l bytes of data per object = 4,340,805 bytes
    • 99,237,420 bytes total -> likely there's some array duplication that I can make more efficient? (or this is from the images...)

@ShrimpCryptid ShrimpCryptid changed the title Loaded datasets do not update URL Very large datasets encounter memory issues - loaded datasets do not update URL Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant