Density/sparsity of velocity predictions and memory requirement #1127

josiahbjorgaard · 2023-09-27T23:46:24Z

josiahbjorgaard
Sep 27, 2023

Hi folks,

I'm working with some very large aggregate single cell datasets and seem to be running up on memory requirements for calculating velocity.

I've noticed that the velocity and related anndata tables aren't sparse - it looks like I could use roughly 20% of the memory by making them so. Can the velocity calculation be done without dense tables, or is that required?

See below for what this looks like on a small subset of the data I'd like to use (roughtly 1% of the total dataset).

When I use the "stochastic" velocity method, I get this:

import scvelo as scv
import sys

ldata = scv.read_loom(loomfile, sparse=True)
print(f"{sys.getsizeof(ldata)/1024**3} Gb")
>>> 0.17349910642951727 Gb

barcodes = [bc.split(':')[1] for bc in ldata.obs.index.tolist()]
barcodes = [bc[0:len(bc)-1] + '_10' for bc in barcodes]
ldata.obs.index = barcodes
ldata.var_names_make_unique()
scv.pp.filter_and_normalize(ldata)
scv.pp.moments(ldata)
scv.tl.velocity(ldata, mode='stochastic', use_highly_variable=False)

for item in ldata.layers:
    density = ldata.layers[item].nonzero()[0].shape[0]/np.prod(ldata.layers['velocity'].shape)
    type_name = type(ldata.layers[item]).__name__
    print(f"layer[{item}] type:{type_name} density:{density}")
>>> layer[matrix] type:csr_matrix density:0.01598885532107232
>>> layer[ambiguous] type:csr_matrix density:0.0017801979105726371
>>> layer[spliced] type:csr_matrix density:0.00822740444802984
>>> layer[unspliced] type:csr_matrix density:0.007044277223715236
>>> layer[Ms] type:ndarray density:0.11253607546389915
>>> layer[Mu] type:ndarray density:0.13557148290010484
>>> layer[velocity] type:ndarray density:0.18959097503522324
>>> layer[variance_velocity] type:ndarray density:0.004700277147654737

print(f"{sys.getsizeof(ldata)/1024**3} Gb")
>>> 8.67932164017111 Gb

When I use the dynamic velocity method, I similarly see dense matrices, yet instead of zeroes, they're filled with NaN's:

for item in ldata.layers:
    density = ldata.layers[item].nonzero()[0].shape[0]/np.prod(ldata.layers['velocity'].shape)
    type_name = type(ldata.layers[item]).__name__
    if type_name =='ndarray':
        nandensity = np.isnan(ldata.layers[item]).sum()/np.prod(ldata.layers['velocity'].shape)
    else:
        nandensity = 'N/A'
    print(f"layer[{item}] type:{type_name} density:{density} nandensity:{nandensity}")
>>> layer[matrix] type:csr_matrix density:0.01598885532107232 nandensity:N/A
>>> layer[ambiguous] type:csr_matrix density:0.0017801979105726371 nandensity:N/A
>>> layer[spliced] type:csr_matrix density:0.00822740444802984 nandensity:N/A
>>> layer[unspliced] type:csr_matrix density:0.007044277223715236 nandensity:N/A
>>> layer[Ms] type:ndarray density:0.11253383059822178 nandensity:0.0
>>> layer[Mu] type:ndarray density:0.1355685580397101 nandensity:0.0
>>> layer[fit_t] type:ndarray density:0.996528520994587 nandensity:0.9778148138029016
>>> layer[fit_tau] type:ndarray density:0.9956772753478536 nandensity:0.9778148138029016
>>> layer[fit_tau_] type:ndarray density:0.9999929651455602 nandensity:0.9778148138029016
>>> layer[velocity] type:ndarray density:0.9898914599320381 nandensity:0.9778148138029016
>>> layer[velocity_u] type:ndarray density:0.9895585892712316 nandensity:0.9778148138029016

print(f"{sys.getsizeof(ldata)/1024**3} Gb")
>>> 25.514364475384355 Gb

I'm curious if this is the expected behavior or if I'm doing something incorrectly?

P.S.
Is there any way to calculate velocity on a large dataset chunk-wise, like splitting the number of cells into 100 sets? Would this be an issue for the accuracy of the velocity calculation?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Density/sparsity of velocity predictions and memory requirement #1127

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Density/sparsity of velocity predictions and memory requirement #1127

josiahbjorgaard Sep 27, 2023

Replies: 0 comments

josiahbjorgaard
Sep 27, 2023