-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add function to transfer cell type labels in MappingProblem and memory allocation issues #559
Comments
hi @valelorenzi9 , thank you for your interest in moscot.
it should work for more than 30 genes, although to be fair we don't have the
an easy solution to this would be to do a for loop and concatenate resulting the anndatas such as: adatas_l = []
for genes in np.array_split(adata_sc.var_names, 100): # split all genes in 100 lists of ~30 genes
adatas_l.append(mp.impute(var_names=genes,))
adata_imputed = ad.concat(adatas_l, axis=1)
you could use the dummy = pd.get_dummies(adata_sc.obs["annotation"])
out= mp[("src", "tgt")].pull(dummy, scale_by_marginals=True)
clusters = pd.Categorical([dummy.columns[i] for i in np.array(out.argmax(1))])
adata_spatial.obs["annotation_mapped"] = clusters the difference between the two methods is the way the cluster assignment for a spatial cell is selected, the first it selects base on sum of the transportation cost, the second based on the argmax. The former is more conservative and might return fewer clusters than the ones in the source. That might be a sensible thing (especially in the non-low rank case where you might have explicitly set some tau for the unbalance case) but might also not. Interpretation would be required. |
Thank you very much for the super helpful answers, @giovp!
For now your for loop solution worked perfectly fine and in very reasonable time, thank you for that! Would you be able to provide an example of how to transfer cell type annotations with the cell_transition method? I don't fully understand how to make it work in the case where you don't have any annotations in the spatial data. Thanks again!! |
hi @valelorenzi9 , apologies for late reply, you can try something like this df = problem.cell_transition(
source="src",
target="tgt",
target_groups="annotation",
forward=True,
aggregation_mode="cell",
batch_size=1024,
) if it asks for
understood. This means that you can play around with |
implementation idea for this is the following @ArinaDanilina one method called |
hi @valelorenzi9 , we now have an example showcasing the function for annotation label transfer |
Hi,
Thank you very much for developing this framework! I am using the MappingProblem to transfer gene expression information from scRNA-seq data to In Situ Sequencing data and I have two questions:
What are the memory requirement for the mp.impute() and mp.correlate() functions? For mp.impute(), if I try to impute all the HVGs in my scRNA-seq dataset (~3k genes) I get XlaRuntimeError: RESOURCE_EXHAUSTED: Out of memory allocating XXXX bytes. If I specify a subset of ~30 genes to impute then it works perfectly fine, but I was wondering what is the upper limit of genes you can impute? The same memory error is also thrown when running mp.correlate(). I am using a VM with 300GB of RAM.
Are you planning to implement a function to also transfer the cell type labels from the dissociated data to spatial data or if not do you recommend a specific way of approaching the problem using the transferred gene expression information?
Thank you very much in advance!
Valentina
The text was updated successfully, but these errors were encountered: