Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cell_transition bug #751

Merged
merged 5 commits into from
Oct 10, 2024
Merged

Fix cell_transition bug #751

merged 5 commits into from
Oct 10, 2024

Conversation

selmanozleyen
Copy link
Collaborator

@selmanozleyen selmanozleyen commented Oct 7, 2024

Hi @MUCDK ,

Related #743. This used to give an error but now it doesn't

import pandas as pd

adata_sc = datasets.drosophila(spatial=False)
adata_sp = datasets.drosophila(spatial=True)
adata_sc, adata_sp

if "test_col_1" in adata_sp.obs:
    del adata_sp.obs["test_col_1"]
if "test_col_2" in adata_sc.obs:
    del adata_sc.obs["test_col_2"]
if "test_col_1" in adata_sc.obs:
    del adata_sc.obs["test_col_1"]
if "test_col_2" in adata_sp.obs:
    del adata_sp.obs["test_col_2"]

adata_sc.obs["test_col_2"] = pd.Categorical(np.random.choice(["a", "b"], size=len(adata_sc)))
adata_sp.obs["test_col_1"] = pd.Categorical(np.random.choice(["a", "b"], size=len(adata_sp)))

mp = MappingProblem(adata_sc, adata_sp)
mp = mp.prepare(
    sc_attr={"attr": "obsm", "key": "X_pca"}, xy_callback="local-pca"
).solve(epsilon=10.0)


mp.cell_transition("src", "tgt", 
                   "test_col_1", "test_col_2")

I just separated the key used for storing the result and the key to push/pull.

I have some extra notes on function for future refactoring though:

  • other_keys is always None from what I understand.
  • Instead of a separate argument for cell aggregation I think user can just set source_groups=None. (I am still not sure about this I am reading the code more atm)

@@ -198,22 +198,24 @@ def _cell_transition_online(
)
df_source = _get_df_cell_transition(
self.adata,
[source_annotation_key] if aggregation_mode == "cell" else [source_annotation_key, target_annotation_key],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can we drop the distinction between the two aggregation modes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we always expected source_annotation_key=target_annotation_key this code worked. But since target_annotation_key might not be in self.adata, I am not sure why it was implemented this way. For example if it's not this way it would expect test_col_2 on adata_sp

@MUCDK
Copy link
Collaborator

MUCDK commented Oct 8, 2024

I might miss something here, but why do you think that other_key is always None?
Whenever we have other_adata, we need it, don't we?

 _get_df_cell_transition(
            self.adata if other_adata is None else other_adata,
            [target_annotation_key] if aggregation_mode == "cell" else [source_annotation_key, target_annotation_key],
            key if other_adata is None else other_key,
            target,
        )

@selmanozleyen
Copy link
Collaborator Author

Hi for example it's set to None here even though the other_data is given

return self._cell_transition(
key=self.batch_key,
source=source,
target=target,
source_groups=source_groups,
target_groups=target_groups,
forward=forward,
aggregation_mode=aggregation_mode,
other_key=None,
other_adata=self.adata_sc,
batch_size=batch_size,
normalize=normalize,
key_added=key_added,
)

I checked again and couldn't see.

@MUCDK
Copy link
Collaborator

MUCDK commented Oct 8, 2024

I see , might that be the reason for the initial bug though?

@selmanozleyen
Copy link
Collaborator Author

hi @MUCDK , I refactored the private methods with these objectives:

  • less duplicate code
  • less usage of pandas in main algorithms. ( I personally don't like using dataframes in algorithm implementations as it gets changed a lot and throws warnings or errors with it). I only use pandas before the return line.
  • private methods take less arguments and are independent of forwards within themselves.

Would you like me to change anything else?

Copy link
Collaborator

@MUCDK MUCDK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks good, thanks!

@selmanozleyen selmanozleyen merged commit ba8d30f into main Oct 10, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

celltype argument expected to have the same name in cell_transition
2 participants