Add function to transfer cell type labels in MappingProblem and memory allocation issues #559

valelorenzi9 · 2023-06-22T15:29:06Z

Hi,
Thank you very much for developing this framework! I am using the MappingProblem to transfer gene expression information from scRNA-seq data to In Situ Sequencing data and I have two questions:

What are the memory requirement for the mp.impute() and mp.correlate() functions? For mp.impute(), if I try to impute all the HVGs in my scRNA-seq dataset (~3k genes) I get XlaRuntimeError: RESOURCE_EXHAUSTED: Out of memory allocating XXXX bytes. If I specify a subset of ~30 genes to impute then it works perfectly fine, but I was wondering what is the upper limit of genes you can impute? The same memory error is also thrown when running mp.correlate(). I am using a VM with 300GB of RAM.
Are you planning to implement a function to also transfer the cell type labels from the dissociated data to spatial data or if not do you recommend a specific way of approaching the problem using the transferred gene expression information?

Thank you very much in advance!
Valentina

giovp · 2023-06-23T18:01:20Z

hi @valelorenzi9 , thank you for your interest in moscot.

What are the memory requirement for the mp.impute() and mp.correlate() functions? For mp.impute(), if I try to impute all the HVGs in my scRNA-seq dataset (~3k genes) I get XlaRuntimeError: RESOURCE_EXHAUSTED: Out of memory allocating XXXX bytes. If I specify a subset of ~30 genes to impute then it works perfectly fine, but I was wondering what is the upper limit of genes you can impute? The same memory error is also thrown when running mp.correlate(). I am using a VM with 300GB of RAM.

it should work for more than 30 genes, although to be fair we don't have the batch_wise implementation as it is done in cell_transition. Couple of questions on this:

are you running it on GPU? if yes, try to pass device="cpu" to the function call
how many cells are in the source and target distribution?
are there batches in the spatial data? meaning, do you explicitly pass batch_key in prepare?

an easy solution to this would be to do a for loop and concatenate resulting the anndatas such as:

adatas_l = []
for genes in np.array_split(adata_sc.var_names, 100): # split all genes in 100 lists of ~30 genes
    adatas_l.append(mp.impute(var_names=genes,))
adata_imputed = ad.concat(adatas_l, axis=1)

Are you planning to implement a function to also transfer the cell type labels from the dissociated data to spatial data or if not do you recommend a specific way of approaching the problem using the transferred gene expression information?

you could use the cell_transition method or alternatively this:

dummy = pd.get_dummies(adata_sc.obs["annotation"])
out= mp[("src", "tgt")].pull(dummy, scale_by_marginals=True)
clusters = pd.Categorical([dummy.columns[i] for i in np.array(out.argmax(1))])
adata_spatial.obs["annotation_mapped"] = clusters

the difference between the two methods is the way the cluster assignment for a spatial cell is selected, the first it selects base on sum of the transportation cost, the second based on the argmax. The former is more conservative and might return fewer clusters than the ones in the source. That might be a sensible thing (especially in the non-low rank case where you might have explicitly set some tau for the unbalance case) but might also not. Interpretation would be required.

valelorenzi9 · 2023-06-26T08:29:16Z

Thank you very much for the super helpful answers, @giovp!
To your questions:

As this is a small spatial sample, I am running everything on CPU
I have 34424 cells in the source (single cell) and 8391 cells in the target (spatial) distribution
There are no batches in the spatial data, it is just one sample/image, so I am not passing the batch_key in prepare

For now your for loop solution worked perfectly fine and in very reasonable time, thank you for that!

Would you be able to provide an example of how to transfer cell type annotations with the cell_transition method? I don't fully understand how to make it work in the case where you don't have any annotations in the spatial data.

Thanks again!!

giovp · 2023-07-05T18:24:05Z

hi @valelorenzi9 , apologies for late reply, you can try something like this

df = problem.cell_transition(
    source="src",
    target="tgt",
    target_groups="annotation",
    forward=True,
    aggregation_mode="cell",
    batch_size=1024,
)

if it asks for source_groups then pass any adata.obs column, effectively it's not used but we should probably change that behaviour.

I have 34424 cells in the source (single cell) and 8391 cells in the target (spatial) distribution

understood. This means that you can play around with tau_a and tau_b to make the problem unbalanced and hence potentially improve results by "not transporting" outliers. let me know if that helps

giovp · 2023-07-13T15:29:08Z

implementation idea for this is the following @ArinaDanilina

one method called celltype_mapping of SpatialAnalysisMixin that accepts two modes: sum is the cell_transition, max is the argmax of the pullback distribution (see comment above). Also all the arguments of cell_transition or pullback needs to be exposed, as well as the potential target name (e.g. tgt by default, but if user has batches, then they should also be passed).

ArinaDanilina · 2024-02-14T12:26:24Z

hi @valelorenzi9 , we now have an example showcasing the function for annotation label transfer
https://moscot.readthedocs.io/en/latest/notebooks/examples/problems/900_annotation_mapping.html

MUCDK assigned ArinaDanilina and giovp and unassigned ArinaDanilina Jun 22, 2023

MUCDK mentioned this issue Jun 26, 2023

Batch-wise computation of gene imputation. #569

Open

giovp assigned ArinaDanilina Jul 13, 2023

giovp mentioned this issue Aug 11, 2023

Pull/push in a batch-wise fashion #592

Open

giovp linked a pull request Jan 19, 2024 that will close this issue

adding _annotation_mapping in AnalysisMixin #585

Merged

ArinaDanilina closed this as completed in #585 Jan 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add function to transfer cell type labels in MappingProblem and memory allocation issues #559

Add function to transfer cell type labels in MappingProblem and memory allocation issues #559

valelorenzi9 commented Jun 22, 2023

giovp commented Jun 23, 2023 •

edited

Loading

valelorenzi9 commented Jun 26, 2023 •

edited

Loading

giovp commented Jul 5, 2023 •

edited

Loading

giovp commented Jul 13, 2023

ArinaDanilina commented Feb 14, 2024

Add function to transfer cell type labels in MappingProblem and memory allocation issues #559

Add function to transfer cell type labels in MappingProblem and memory allocation issues #559

Comments

valelorenzi9 commented Jun 22, 2023

giovp commented Jun 23, 2023 • edited Loading

valelorenzi9 commented Jun 26, 2023 • edited Loading

giovp commented Jul 5, 2023 • edited Loading

giovp commented Jul 13, 2023

ArinaDanilina commented Feb 14, 2024

giovp commented Jun 23, 2023 •

edited

Loading

valelorenzi9 commented Jun 26, 2023 •

edited

Loading

giovp commented Jul 5, 2023 •

edited

Loading