Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cell type annotation: Harmony/KNN workflow #836

Open
wants to merge 48 commits into
base: main
Choose a base branch
from

Conversation

dorien-er
Copy link
Contributor

@dorien-er dorien-er commented Jul 12, 2024

Changelog

Workflow for harmony integration followed by KNN label transfer for cell type annotation.

Issue ticket number and link

Closes #xxxx (Replace xxxx with the GitHub issue number)

Checklist before requesting a review

  • I have performed a self-review of my code

  • Conforms to the Contributor's guide

  • Check the correct box. Does this PR contain:

    • Breaking changes
    • New functionality
    • Major changes
    • Minor changes
    • Documentation
    • Bug fixes
  • Proposed changes are described in the CHANGELOG.md

  • CI tests succeed!

@dorien-er dorien-er changed the title Harmony knn annoation workflow Harmony knn annotation workflow Jul 12, 2024
VladimirShitov and others added 24 commits July 15, 2024 15:08
* cellranger mkgtf component working and tested

* removed comments

* changelog entry added

* test unique attribute in result

* multiple attribute par added

* removed unused packages

* use pytest, multiple attributes tested

---------

Co-authored-by: DriesSchaumont <[email protected]>
Co-authored-by: Dries Schaumont <[email protected]>
* cellranger mkgtf component working and tested

* removed comments

* changelog entry added

* test unique attribute in result

* multiple attribute par added

* removed unused packages

* use pytest, multiple attributes tested

---------

Co-authored-by: DriesSchaumont <[email protected]>
@dorien-er dorien-er force-pushed the harmony-knn-annoation-workflow branch from 6bcfac3 to e2049f1 Compare July 15, 2024 13:20
@dorien-er dorien-er changed the title Harmony knn annotation workflow Cell type annotation: Harmony/KNN workflow Sep 10, 2024
@dorien-er dorien-er marked this pull request as ready for review September 10, 2024 06:45
- name: "--theta"
type: double
description: |
Diversity clustering penalty parameter. Specify for each variable in group.by.vars.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reading this, I am not sure that I know what group.by.vars means here? Is it related to another argument?

`distance` (weight points by the inverse of their distance)
- name: "--n_neighbors"
type: integer
default: 15
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add min?

roles: [ author, maintainer ]
- __merge__: /src/authors/weiwei_schultz.yaml
roles: [ contributor ]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the test_dependencies to info?

Comment on lines +9 to +12
| map {id, state ->
def new_state = state + ["workflow_output": state.output]
[id, new_state]
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| map {id, state ->
def new_state = state + ["workflow_output": state.output]
[id, new_state]
}
| map {id, state ->
def new_state = state + ["workflow_output": state.output]
[id, new_state]
}

Comment on lines +13 to +17
// add id as _meta join id to be able to merge with source channel and end of workflow
| map{ id, state ->
def new_state = state + ["_meta": ["join_id": id]]
[id, new_state]
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is _meta required here? I think the number of input and output events from this workflow are the same and that the IDs of the events match?

}
| view {"After adding join_id: $it"}
// Add 'query' id to .obs columns of query dataset
| add_id.run(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add_id and duplicate_obs could be performed in parallel here by:

  • splitting the input channel into two channels: one for the reference and one for the query
  • performing add_id and duplicate_obs
  • joining the channel back together

And since you have add_id and duplicate_obstwice, please add a key argument to .run (e.g. key: "add_id_query" and key: "add_id_reference" ). This makes sure that the process names remain unique.


assert "rna" in list(input_mudata.mod.keys()), "Input should contain rna modality."
assert all(key in list(input_mudata.mod["rna"].obsm) for key in expected_obsm), f"Input mod['rna'] obs columns should be: {expected_obsm}, found: {input_mudata.mod['rna'].obsm.keys()}."
assert all(key in list(input_mudata.mod["rna"].obs) for key in expected_obs), f"Input mod['rna'] obs columns should be: {expected_obs}, found: {input_mudata.mod['rna'].obs.keys()}."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to add the contents of the newly created columns? (for example that the predictions are in fact labels and the probabilities are floats?)

@DriesSchaumont DriesSchaumont self-assigned this Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants