Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionality to merge altExps #243

Merged
merged 35 commits into from
Dec 13, 2023
Merged

Conversation

sjspielman
Copy link
Member

Towards #150

This PR is the first PR at updating the our SCE merging to consider alternative experiments. This process has been much more of a 🎢 than we maybe had imagined, so you can expect more PRs from here. At a minimum, there will be at least 1-2 more PR here for tests which are not yet implemented.

I also want to be cautious that this PR doesn't get too long, since the code changes here are also fairly extensive. I think a good way to handle big reviews is to stack on this branch with separate PRs. Requesting a look from everybody since we may have to have some interesting discussions here! In an effort to keep this PR from getting out of hand too quickly, please focus on leaving high-level feedback about the approach for your first round of review! Thanks!

Ch-ch-ch-ch-changes

  • Bonus spacing updates throughout for style, as well as relevant comment updates
  • I added an argument include_altexp to the crux function merge_sce_list().
    • This currently has a default of TRUE. Eh?
  • Tests are only passing right now because I set include_altexp=FALSE in all relevant spots in the unit tests. I've updated some other parts in there to lay the groundwork for circling back and adding more tests, but we'll need 1-2 more PRs to get both i) current tests working with include_altexp=TRUE, and ii) tests specifically for the new altexp functions (read on...).
  • The overall strategy I've taken is to handle main and alternative experiments separately - we split out the altExps at the beginning. Then, do all main exp processing and merging as before. Finally at the end, we process and merge the altExps, and then pop it into the merged object's altExp.
    • Originally when working on this, I had the altExp processing happening before the SCE merge, but I kept hitting cryptic bugs, so I adopted this strategy of separate processing which seemed to be less mysteriously buggy!
  • Altexp processing proceeds as follows, via two new functions: prepare_altexps_for_merge() and update_altexp_assay().
    • I gave up kind of quickly tbh trying manipulate sparse matrices directly, so I ended up creating new SCEs with NA values added in for rowData and counts/logcounts assays (note some TODOs in the code about handling assays). The existing altExps are therefore replaced with these new ones.
    • These fresh altexps are then passed through the existing function prepare_sce_for_merge() to keep things identifiable in the final merge

Copy link
Member

@allyhawkins allyhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick look, and I think I would approach this a little bit differently:

  • make a list of mainExp
  • make a list of altExp
  • run both lists through the existing prepare_sce_for_merge
    • within this function I would add an argument for altExp = TRUE (set to FALSE by default)
    • Then inside the existing prepare_sce_for_merge function you first check that all features are present in all SCE objects. If they are not shared and altExp = TRUE then you would run a separate function that prepares and modifies the assays specifically. Something like: if (missing_features & altExp){ merge_assays }.
    • Before running merge_assays you could get a list of assayNames and then apply the merge_assays function to each assay. The last step would be creating the "new" altExp SCE before continuing with the rest of the prepare_sce_for_merge function (what's currently on line 372 - 385).

Let me know what you think of that

R/merge_sce_list.R Outdated Show resolved Hide resolved
R/merge_sce_list.R Outdated Show resolved Hide resolved
R/merge_sce_list.R Outdated Show resolved Hide resolved
R/merge_sce_list.R Outdated Show resolved Hide resolved
R/merge_sce_list.R Outdated Show resolved Hide resolved
R/merge_sce_list.R Outdated Show resolved Hide resolved
@sjspielman
Copy link
Member Author

Thanks everyone for first round of feedback, back to it today! While reading over comments and while reviewing AlexsLemonade/scpca-nf#605, I had thoughts. Sharing them here to get my thoughts straight and get a chance for someone to tell me I'm missing something! Am I missing something or have I gotten the concept down here?

This will get the first altExp, so we are assuming that all the altExps across the merge are the same, which seems dangerous

This comment from @jashapiro brings up a good point which I had not considered. I was operating under the assumption that, because we only allow for 1 altexp at a time in a given SCE, all libraries in a project will have the "same" altexps. This is indeed dangerous! We may see some libraries within a given project (or a different integration group down the line) with adt and others with cellhash. I didn't in-depth check the library metadata to see if this is already the case, but it may indeed be in the future, so we should allow for it. Similarly, I think we already have projects where only some libraries have altexp (adt specifically).

So, I'll need to allow here for the possibility of multiple altexps in the final merged object.
In addition, we'd have to add in a fully NA altexp for any libraries that don't have an altexp that a different library in the project does have. I have no idea what kind of file size weirdness having a matrix with only NA values will cause, but we're already well down the rabbit of file size with merging in the first place, so such is life! At least adt experiments have way fewer features..!

@jashapiro
Copy link
Member

because we only allow for 1 altexp at a time in a given SCE

I'll take minor issue with this statement: scpca-nf only supports 1 altExp at the moment, but scpcaTools has no such limitation! We can use merge_altExp() repeatedly to add as many altExps as we like.

Otherwise, I think your overall plan should be fine... To state it as I would see it:

  1. Get a vector of all altexps from all libraries with altExpNames() |> union()
  2. loop/walk through that list to make merged altExp objects and
  3. join them them into the merged SCE.

One thing we haven't discussed, but in the merged SCEs, we probably want to update the colnames such that they are a combination of barcode and library_id if we are not doing that already, in order to keep them unique.

@sjspielman
Copy link
Member Author

One thing we haven't discussed, but in the merged SCEs, we probably want to update the colnames such that they are a combination of barcode and library_id if we are not doing that already, in order to keep them unique.

This part is happily already there ✅

# Add `sce_name` to colnames so cell ids can be mapped to originating SCE
colnames(sce) <- glue::glue("{sce_name}-{colnames(sce)}")

…nd then substitute in existing matrices. Do this for each altExp name, and for each assay within those set of altexps
…ly similar function name to the existing merge_altexp function
Copy link
Member

@allyhawkins allyhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look at the changes to the tests you made yet, but I made a few smaller comments throughout the function changes. Generally, I'm good with this approach, I just need some helpful comments so that I remember that colnames of the merged SCE object correspond to cells 🤦‍♀️

R/merge_sce_list.R Outdated Show resolved Hide resolved
Comment on lines 224 to 226
# Ensure compatible column names
# (this is probably not necessary but doesn't hurt...)
merged_sce <- merged_sce[,merged_colnames]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to subset here... I think if we are already grabbing all columns directly from the merged SCE, we don't need this or want this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be subsetting, although the syntax is the same. It's just ensuring it's all in the same order as the altexp names. This is the same all_merged_barcodes variable that was confusing.

R/merge_sce_list.R Outdated Show resolved Hide resolved
R/merge_sce_list.R Outdated Show resolved Hide resolved
R/merge_sce_list.R Outdated Show resolved Hide resolved
R/merge_sce_list.R Outdated Show resolved Hide resolved
Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small comments below, but I think the general plan looks good.

I see that in the meantime, Ally has put in some comments.

I too want to go through the tests, but I will do that next...

R/merge_sce_list.R Outdated Show resolved Hide resolved
R/merge_sce_list.R Outdated Show resolved Hide resolved
R/merge_sce_list.R Outdated Show resolved Hide resolved
R/merge_sce_list.R Outdated Show resolved Hide resolved
R/merge_sce_list.R Show resolved Hide resolved
R/merge_sce_list.R Outdated Show resolved Hide resolved
@sjspielman
Copy link
Member Author

I too want to go through the tests, but I will do that next...

FYI there isn't much new there. I only added some code to set up for testing altexps, and I added include_altexp=FALSE to everything else! No new actual tests yet.

I expect my next steps will be 1) PR to add rowdata+coldata, and 2) PR for testing. Sound ok?

Copy link
Member

@allyhawkins allyhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!


# First we need to determine the final column names of the merged_sce (not yet made)
# for use in altExp code. Later we'll apply this order to the merged_sce itself.
# These values are cell ids: `{sce_name}_{barcode}`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you 😂

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this looks good, but now I see that you only updated the tests that do not include altExps.

I think it would also be useful to also include at least one test for what you expect the updated function to do. You can add edge cases in a subsequent PR, but there should be at least one test that shows that the code you included is working... You should also probably relabel the tests you updated to indicate that they are testing without altExps.

I might rearrange things to leave the tests as they were, starting with an sce_list with no altExp present (which should not require setting include_altexp), then add a separate section below to make an sce_list that does have altExps and test with both include_altexp = FALSE and include_altexp = TRUE.

tests/testthat/test-merge_sce_list.R Show resolved Hide resolved
@sjspielman
Copy link
Member Author

I've rearrangement the tested code as suggested in review, and I added a couple tests - first a couple tests that merge_sce_list() works as expected when altExps are present, and another test that include_altexp=FALSE indeed doesn't return an altExp. Let me know if there are more tests you'd want to see at this stage!

@sjspielman sjspielman requested a review from jashapiro December 13, 2023 15:20
Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I might add an issue for adding a few more tests for situations where there are multiple altExps, but this tests the basic functionality, which is kind of the minimal expectation. I also like that the previous tests are left mostly alone, which is a good thing for making sure you aren't introducing too many breaking changes (except directly related to the functionality you are adding).

My suggestion below are pretty minimal, with the biggest one just being that you might as well set the altExp name when you create it rather than requiring a whole separate set of code. Other than that, I think this should be good to go.

R/merge_sce_list.R Outdated Show resolved Hide resolved
Comment on lines 369 to 370
# vector of all expected names
full_altexp_names <- names(altExp(sce_list_with_altexp[[2]]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which names are these? I think you mean feature names (colnames), and I much prefer being explicit here.

tests/testthat/test-merge_sce_list.R Outdated Show resolved Hide resolved
Comment on lines 372 to 380
# set altExp names themselves - all altexp are the same name here
altexp_name <- "adt"
sce_list_with_altexp <- sce_list_with_altexp |>
purrr::map(
\(sce) {
altExpNames(sce) <- altexp_name
sce # return the sce
}
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment earlier about doing this at creation time.

}
)

test_that("merging SCEs with matching features works as expected, with altexps", {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by matching features here? There are missing features in 1, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, copy/paste fail...

tests/testthat/test-metadata_to_coldata.R Outdated Show resolved Hide resolved
tests/testthat/test-merge_sce_list.R Outdated Show resolved Hide resolved
tests/testthat/test-merge_sce_list.R Outdated Show resolved Hide resolved
@sjspielman sjspielman merged commit 8623377 into main Dec 13, 2023
2 checks passed
@sjspielman sjspielman deleted the sjspielman/150-altexps-merging branch December 13, 2023 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants