Fix `cell_transition` bug #751

selmanozleyen · 2024-10-07T13:28:48Z

Related #743. This used to give an error but now it doesn't

import pandas as pd

adata_sc = datasets.drosophila(spatial=False)
adata_sp = datasets.drosophila(spatial=True)
adata_sc, adata_sp

if "test_col_1" in adata_sp.obs:
    del adata_sp.obs["test_col_1"]
if "test_col_2" in adata_sc.obs:
    del adata_sc.obs["test_col_2"]
if "test_col_1" in adata_sc.obs:
    del adata_sc.obs["test_col_1"]
if "test_col_2" in adata_sp.obs:
    del adata_sp.obs["test_col_2"]

adata_sc.obs["test_col_2"] = pd.Categorical(np.random.choice(["a", "b"], size=len(adata_sc)))
adata_sp.obs["test_col_1"] = pd.Categorical(np.random.choice(["a", "b"], size=len(adata_sp)))

mp = MappingProblem(adata_sc, adata_sp)
mp = mp.prepare(
    sc_attr={"attr": "obsm", "key": "X_pca"}, xy_callback="local-pca"
).solve(epsilon=10.0)


mp.cell_transition("src", "tgt", 
                   "test_col_1", "test_col_2")

I just separated the key used for storing the result and the key to push/pull.

I have some extra notes on function for future refactoring though:

other_keys is always None from what I understand.
Instead of a separate argument for cell aggregation I think user can just set source_groups=None. (I am still not sure about this I am reading the code more atm)

MUCDK · 2024-10-08T06:19:15Z

src/moscot/base/problems/_mixins.py

@@ -198,22 +198,24 @@ def _cell_transition_online(
        )
        df_source = _get_df_cell_transition(
            self.adata,
-            [source_annotation_key] if aggregation_mode == "cell" else [source_annotation_key, target_annotation_key],


why can we drop the distinction between the two aggregation modes?

since we always expected source_annotation_key=target_annotation_key this code worked. But since target_annotation_key might not be in self.adata, I am not sure why it was implemented this way. For example if it's not this way it would expect test_col_2 on adata_sp

MUCDK · 2024-10-08T06:21:43Z

I might miss something here, but why do you think that other_key is always None?
Whenever we have other_adata, we need it, don't we?

 _get_df_cell_transition(
            self.adata if other_adata is None else other_adata,
            [target_annotation_key] if aggregation_mode == "cell" else [source_annotation_key, target_annotation_key],
            key if other_adata is None else other_key,
            target,
        )

selmanozleyen · 2024-10-08T11:18:14Z

Hi for example it's set to None here even though the other_data is given

moscot/src/moscot/problems/space/_mixins.py

Lines 672 to 685 in 578e3eb

    
           return self._cell_transition( 
        
               key=self.batch_key, 
        
               source=source, 
        
               target=target, 
        
               source_groups=source_groups, 
        
               target_groups=target_groups, 
        
               forward=forward, 
        
               aggregation_mode=aggregation_mode, 
        
               other_key=None, 
        
               other_adata=self.adata_sc, 
        
               batch_size=batch_size, 
        
               normalize=normalize, 
        
               key_added=key_added, 
        
           )

I checked again and couldn't see.

MUCDK · 2024-10-08T12:16:12Z

I see , might that be the reason for the initial bug though?

selmanozleyen · 2024-10-09T16:19:43Z

hi @MUCDK , I refactored the private methods with these objectives:

less duplicate code
less usage of pandas in main algorithms. ( I personally don't like using dataframes in algorithm implementations as it gets changed a lot and throws warnings or errors with it). I only use pandas before the return line.
private methods take less arguments and are independent of forwards within themselves.

Would you like me to change anything else?

MUCDK

That looks good, thanks!

fix the bug and set observed=False for future pandas and warning

42550e3

selmanozleyen mentioned this pull request Oct 7, 2024

celltype argument expected to have the same name in cell_transition #743

Closed

selmanozleyen linked an issue Oct 7, 2024 that may be closed by this pull request

celltype argument expected to have the same name in cell_transition #743

Closed

selmanozleyen requested a review from MUCDK October 7, 2024 13:44

MUCDK reviewed Oct 8, 2024

View reviewed changes

Merge branch 'main' into fix/celltype-mapping

eb431e2

selmanozleyen added 3 commits October 9, 2024 17:35

update _cell_transition_online

8f70c1e

refactor some functions

159eba0

remove some unnecesary lines

92acfbe

MUCDK approved these changes Oct 9, 2024

View reviewed changes

selmanozleyen merged commit ba8d30f into main Oct 10, 2024
8 checks passed

MUCDK mentioned this pull request Oct 10, 2024

incorrect arguments passed to annotation_mapping #742

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `cell_transition` bug #751

Fix `cell_transition` bug #751

selmanozleyen commented Oct 7, 2024 •

edited

Loading

MUCDK Oct 8, 2024

selmanozleyen Oct 8, 2024

MUCDK commented Oct 8, 2024

selmanozleyen commented Oct 8, 2024

MUCDK commented Oct 8, 2024

selmanozleyen commented Oct 9, 2024

MUCDK left a comment

Fix cell_transition bug #751

Fix cell_transition bug #751

Conversation

selmanozleyen commented Oct 7, 2024 • edited Loading

MUCDK Oct 8, 2024

Choose a reason for hiding this comment

selmanozleyen Oct 8, 2024

Choose a reason for hiding this comment

MUCDK commented Oct 8, 2024

selmanozleyen commented Oct 8, 2024

MUCDK commented Oct 8, 2024

selmanozleyen commented Oct 9, 2024

MUCDK left a comment

Choose a reason for hiding this comment

Fix `cell_transition` bug #751

Fix `cell_transition` bug #751

selmanozleyen commented Oct 7, 2024 •

edited

Loading