-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add functionality to merge altExps #243
Conversation
…r merge, and adopt new strategy as hopefully a bug squasher - merge main and alt exps separately, and then recombine at the end
…s written yet and tests not yet passing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a quick look, and I think I would approach this a little bit differently:
- make a list of mainExp
- make a list of altExp
- run both lists through the existing
prepare_sce_for_merge
- within this function I would add an argument for
altExp = TRUE
(set toFALSE
by default) - Then inside the existing
prepare_sce_for_merge
function you first check that all features are present in all SCE objects. If they are not shared andaltExp = TRUE
then you would run a separate function that prepares and modifies the assays specifically. Something like:if (missing_features & altExp){ merge_assays }
. - Before running
merge_assays
you could get a list ofassayNames
and then apply themerge_assays
function to each assay. The last step would be creating the "new" altExp SCE before continuing with the rest of theprepare_sce_for_merge
function (what's currently on line 372 - 385).
- within this function I would add an argument for
Let me know what you think of that
Thanks everyone for first round of feedback, back to it today! While reading over comments and while reviewing AlexsLemonade/scpca-nf#605, I had thoughts. Sharing them here to get my thoughts straight and get a chance for someone to tell me I'm missing something! Am I missing something or have I gotten the concept down here?
This comment from @jashapiro brings up a good point which I had not considered. I was operating under the assumption that, because we only allow for 1 altexp at a time in a given SCE, all libraries in a project will have the "same" altexps. This is indeed dangerous! We may see some libraries within a given project (or a different integration group down the line) with adt and others with cellhash. I didn't in-depth check the library metadata to see if this is already the case, but it may indeed be in the future, so we should allow for it. Similarly, I think we already have projects where only some libraries have altexp (adt specifically). So, I'll need to allow here for the possibility of multiple altexps in the final merged object. |
I'll take minor issue with this statement: Otherwise, I think your overall plan should be fine... To state it as I would see it:
One thing we haven't discussed, but in the merged SCEs, we probably want to update the |
This part is happily already there ✅ Lines 237 to 238 in 4a063ff
|
Co-authored-by: Joshua Shapiro <[email protected]>
…rowData construction and new assay construction
This reverts commit bfa6846.
…nd then substitute in existing matrices. Do this for each altExp name, and for each assay within those set of altexps
…ly similar function name to the existing merge_altexp function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look at the changes to the tests you made yet, but I made a few smaller comments throughout the function changes. Generally, I'm good with this approach, I just need some helpful comments so that I remember that colnames of the merged SCE object correspond to cells 🤦♀️
R/merge_sce_list.R
Outdated
# Ensure compatible column names | ||
# (this is probably not necessary but doesn't hurt...) | ||
merged_sce <- merged_sce[,merged_colnames] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to subset here... I think if we are already grabbing all columns directly from the merged SCE, we don't need this or want this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be subsetting, although the syntax is the same. It's just ensuring it's all in the same order as the altexp names. This is the same all_merged_barcodes
variable that was confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few small comments below, but I think the general plan looks good.
I see that in the meantime, Ally has put in some comments.
I too want to go through the tests, but I will do that next...
Co-authored-by: Ally Hawkins <[email protected]> Co-authored-by: Joshua Shapiro <[email protected]>
Co-authored-by: Joshua Shapiro <[email protected]>
FYI there isn't much new there. I only added some code to set up for testing altexps, and I added I expect my next steps will be 1) PR to add rowdata+coldata, and 2) PR for testing. Sound ok? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
R/merge_sce_list.R
Outdated
|
||
# First we need to determine the final column names of the merged_sce (not yet made) | ||
# for use in altExp code. Later we'll apply this order to the merged_sce itself. | ||
# These values are cell ids: `{sce_name}_{barcode}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this looks good, but now I see that you only updated the tests that do not include altExps.
I think it would also be useful to also include at least one test for what you expect the updated function to do. You can add edge cases in a subsequent PR, but there should be at least one test that shows that the code you included is working... You should also probably relabel the tests you updated to indicate that they are testing without altExps.
I might rearrange things to leave the tests as they were, starting with an sce_list with no altExp present (which should not require setting include_altexp
), then add a separate section below to make an sce_list that does have altExps and test with both include_altexp = FALSE
and include_altexp = TRUE
.
…altexps=FALSE. Define separate sce_list_with_altexps for next steps
…ing long test suite
I've rearrangement the tested code as suggested in review, and I added a couple tests - first a couple tests that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. I might add an issue for adding a few more tests for situations where there are multiple altExps, but this tests the basic functionality, which is kind of the minimal expectation. I also like that the previous tests are left mostly alone, which is a good thing for making sure you aren't introducing too many breaking changes (except directly related to the functionality you are adding).
My suggestion below are pretty minimal, with the biggest one just being that you might as well set the altExp name when you create it rather than requiring a whole separate set of code. Other than that, I think this should be good to go.
tests/testthat/test-merge_sce_list.R
Outdated
# vector of all expected names | ||
full_altexp_names <- names(altExp(sce_list_with_altexp[[2]])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which names are these? I think you mean feature names (colnames
), and I much prefer being explicit here.
tests/testthat/test-merge_sce_list.R
Outdated
# set altExp names themselves - all altexp are the same name here | ||
altexp_name <- "adt" | ||
sce_list_with_altexp <- sce_list_with_altexp |> | ||
purrr::map( | ||
\(sce) { | ||
altExpNames(sce) <- altexp_name | ||
sce # return the sce | ||
} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment earlier about doing this at creation time.
tests/testthat/test-merge_sce_list.R
Outdated
} | ||
) | ||
|
||
test_that("merging SCEs with matching features works as expected, with altexps", { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by matching features here? There are missing features in 1, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, copy/paste fail...
Co-authored-by: Joshua Shapiro <[email protected]>
Co-authored-by: Joshua Shapiro <[email protected]>
Towards #150
This PR is the first PR at updating the our SCE merging to consider alternative experiments. This process has been much more of a 🎢 than we maybe had imagined, so you can expect more PRs from here. At a minimum, there will be at least 1-2 more PR here for tests which are not yet implemented.
I also want to be cautious that this PR doesn't get too long, since the code changes here are also fairly extensive. I think a good way to handle big reviews is to stack on this branch with separate PRs. Requesting a look from everybody since we may have to have some interesting discussions here! In an effort to keep this PR from getting out of hand too quickly, please focus on leaving high-level feedback about the approach for your first round of review! Thanks!
Ch-ch-ch-ch-changes
include_altexp
to the crux functionmerge_sce_list()
.TRUE
. Eh?include_altexp=FALSE
in all relevant spots in the unit tests. I've updated some other parts in there to lay the groundwork for circling back and adding more tests, but we'll need 1-2 more PRs to get both i) current tests working withinclude_altexp=TRUE
, and ii) tests specifically for the new altexp functions (read on...).prepare_altexps_for_merge()
andupdate_altexp_assay()
.NA
values added in for rowData and counts/logcounts assays (note someTODO
s in the code about handling assays). The existing altExps are therefore replaced with these new ones.prepare_sce_for_merge()
to keep things identifiable in the final merge