You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm working with some very large aggregate single cell datasets and seem to be running up on memory requirements for calculating velocity.
I've noticed that the velocity and related anndata tables aren't sparse - it looks like I could use roughly 20% of the memory by making them so. Can the velocity calculation be done without dense tables, or is that required?
See below for what this looks like on a small subset of the data I'd like to use (roughtly 1% of the total dataset).
When I use the "stochastic" velocity method, I get this:
import scvelo as scv
import sys
ldata = scv.read_loom(loomfile, sparse=True)
print(f"{sys.getsizeof(ldata)/1024**3} Gb")
>>> 0.17349910642951727 Gb
barcodes = [bc.split(':')[1] for bc in ldata.obs.index.tolist()]
barcodes = [bc[0:len(bc)-1] + '_10' for bc in barcodes]
ldata.obs.index = barcodes
ldata.var_names_make_unique()
scv.pp.filter_and_normalize(ldata)
scv.pp.moments(ldata)
scv.tl.velocity(ldata, mode='stochastic', use_highly_variable=False)
for item in ldata.layers:
density = ldata.layers[item].nonzero()[0].shape[0]/np.prod(ldata.layers['velocity'].shape)
type_name = type(ldata.layers[item]).__name__
print(f"layer[{item}] type:{type_name} density:{density}")
>>> layer[matrix] type:csr_matrix density:0.01598885532107232
>>> layer[ambiguous] type:csr_matrix density:0.0017801979105726371
>>> layer[spliced] type:csr_matrix density:0.00822740444802984
>>> layer[unspliced] type:csr_matrix density:0.007044277223715236
>>> layer[Ms] type:ndarray density:0.11253607546389915
>>> layer[Mu] type:ndarray density:0.13557148290010484
>>> layer[velocity] type:ndarray density:0.18959097503522324
>>> layer[variance_velocity] type:ndarray density:0.004700277147654737
print(f"{sys.getsizeof(ldata)/1024**3} Gb")
>>> 8.67932164017111 Gb
When I use the dynamic velocity method, I similarly see dense matrices, yet instead of zeroes, they're filled with NaN's:
I'm curious if this is the expected behavior or if I'm doing something incorrectly?
P.S.
Is there any way to calculate velocity on a large dataset chunk-wise, like splitting the number of cells into 100 sets? Would this be an issue for the accuracy of the velocity calculation?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi folks,
I'm working with some very large aggregate single cell datasets and seem to be running up on memory requirements for calculating velocity.
I've noticed that the velocity and related anndata tables aren't sparse - it looks like I could use roughly 20% of the memory by making them so. Can the velocity calculation be done without dense tables, or is that required?
See below for what this looks like on a small subset of the data I'd like to use (roughtly 1% of the total dataset).
When I use the "stochastic" velocity method, I get this:
When I use the dynamic velocity method, I similarly see dense matrices, yet instead of zeroes, they're filled with NaN's:
I'm curious if this is the expected behavior or if I'm doing something incorrectly?
P.S.
Is there any way to calculate velocity on a large dataset chunk-wise, like splitting the number of cells into 100 sets? Would this be an issue for the accuracy of the velocity calculation?
Beta Was this translation helpful? Give feedback.
All reactions