Very large datasets encounter memory issues - loaded datasets do not update URL #485

ShrimpCryptid · 2024-12-02T19:35:24Z

Description

A clear description of the bug

Expected Behavior

What did you expect to happen instead?

Reproduction

Open https://timelapse.allencell.org
Click load and paste in a dataset: https://dev-aics-dtp-001.int.allencell.org/assay-dev/users/Chantelle/colorizer/new_backdrop/exploratory_dataset/collection.json
The dataset will either fail to load and crash OR load normally but the URL will not update.

Environment

Any additional information about your environment. E.g. OS version, python version, browser etc.

ShrimpCryptid · 2024-12-02T20:58:00Z

Working hypothesis is that Chantelle's dataset has a lot more features than previous datasets, which take up significantly more memory when initially loaded. This causes the out of memory error and may also be causing the strange behavior with the URL not updating.

I think this is what's causing it... there's an Array allocated that has 4x the elements of the other arrays but is taking up 90x the space? Note that the bounds data has 4x the elements of a feature array, but there are 91 features in the dataset...
I'm not entirely sure how to read this so I'll need to do some more digging.

Update:

Shallow size is the size of the object itself in memory.
Retained size is the size of the object + all its dependent objects.
Currently, JSArrayBufferData is taking up 187 MB, likely due to all of the parquet files being loaded into memory.
Napkin math:
- 206,705 objects in the dataset x 91 features x size of F32 = 94,896,620 bytes
- tracks + times + centroids + bounds + outliers is 21 add'l bytes of data per object = 4,340,805 bytes
- 99,237,420 bytes total -> likely there's some array duplication that I can make more efficient? (or this is from the images...)

ShrimpCryptid added the bug Something isn't working label Dec 2, 2024

ShrimpCryptid changed the title ~~Loaded datasets do not update URL~~ Very large datasets encounter memory issues - loaded datasets do not update URL Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very large datasets encounter memory issues - loaded datasets do not update URL #485

Very large datasets encounter memory issues - loaded datasets do not update URL #485

ShrimpCryptid commented Dec 2, 2024

ShrimpCryptid commented Dec 2, 2024 •

edited

Loading

Very large datasets encounter memory issues - loaded datasets do not update URL #485

Very large datasets encounter memory issues - loaded datasets do not update URL #485

Comments

ShrimpCryptid commented Dec 2, 2024

Description

Expected Behavior

Reproduction

Environment

ShrimpCryptid commented Dec 2, 2024 • edited Loading

ShrimpCryptid commented Dec 2, 2024 •

edited

Loading