-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Address metadata during altexp merging and other metadata fixes #255
Conversation
Noting I'm going to wait to officially request review here until #251 is merged, but it is ready (aka this is not "draft" status) if folks want to look sooner! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much of this looks good, but I think you made a pretty big change to the contents of library_metadata
in that you are now returning the transposed data. I think we want to leave it as it was, with contents organized first by the SCE name, such that metadata(merged_sce)$library_metadata[["sce1"]]
returns the original metadata as it went in for each SCE.
Note that you may not want to just re-transpose, as that will add in NULL
values for any metadata fields that are not present in all samples.
I had a few other comments below, but I think most of those are more minor, except maybe for the one about preserving order in the altExp metadata.
sce_list |> | ||
purrr::keep(\(sce) altexp_name %in% altExpNames(sce)) |> | ||
purrr::map(altExp, altexp_name) |> | ||
check_metadata() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need the library id & sample id in the altExp? I guess it doesn't hurt, but it seems redundant. We have it there mostly because it was there in the original import.
But it is small, so I guess it makes sense to keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree it could sort of go either way. I figure if someone wanted to fully pull out the altexp it might be useful to have this information handy?
Noting the GHA error -
|
Co-authored-by: Joshua Shapiro <[email protected]>
This can be safely ignored as a transient cache problem, unless we start to see it on every build. |
Passing locally! ✅ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, a bunch of small comments, but I think we are almost there.
The biggest thing is probably just adding a few tests that we get the sample_metadata
table correct when we merge the metadata. We do check it as part of the main merge checks, a bit, but it would be good to have those tests in the prepare_merged_metadata
tests as well.
R/merge_sce_list.R
Outdated
for (altexp_name in names(altexp_attributes)) { | ||
# For any SCEs without this altExp, create the library_id and sample_id metadata | ||
additional_metadata <- sce_list |> | ||
purrr::keep(\(sce) !(altexp_name %in% altExpNames(sce))) |> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the named functions and fewer bangs
purrr::keep(\(sce) !(altexp_name %in% altExpNames(sce))) |> | |
purrr::discard(\(sce) (altexp_name %in% altExpNames(sce))) |> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might actually pull the test into a separate line before the keep and discard, just to save recalculating it, and use that in both keep
and discard
.
has_altexp <- purrr::map_lgl(\(sce) (altexp_name %in% altExpNames(sce))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
) | ||
}) | ||
|
||
test_that("prepare_merged_metadata works as expected, with sample_metadata present", { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably add a couple more tests in here that the sample_metadata
is what we expect as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_that("prepare_merged_metadata works as expected from altExps metadata (aka, sample_metadata NOT present)", { | ||
observed_metadata <- sce_list_with_altexp |> | ||
purrr::map(altExp) |> | ||
purrr::map(metadata) |> | ||
prepare_merged_metadata() | ||
|
||
expect_setequal( | ||
names(observed_metadata), | ||
c("library_id", "sample_id", "library_metadata") | ||
) | ||
|
||
expect_setequal( | ||
observed_metadata$library_id, | ||
glue::glue("library-{names(sce_list)}") | ||
) | ||
expect_setequal( | ||
observed_metadata$sample_id, | ||
glue::glue("sample-{names(sce_list)}") | ||
) | ||
expect_setequal( | ||
observed_metadata$library_metadata |> | ||
purrr::map(names) |> | ||
purrr::reduce(intersect), | ||
c("library_id", "sample_id", "mapped_reads", "ambient_profile") | ||
) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these tests are probably redundant. You already do much of the required testing for altExp metadata in the tests for the full merge with altExps, so this is really just repeating the prepare_merged_metadata()
tests with a slightly different set of data. But it is good that you are testing that the behavior is correct when sample_metadata
doesn't exist, so I guess maybe leave it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had almost the identical thought process about redundancy and then deciding maybe it was ok!
Co-authored-by: Joshua Shapiro <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made one small variable name suggestion, but I think this should be good to go.
Co-authored-by: Joshua Shapiro <[email protected]>
Closes #250
This (surprise!) PR deals with the metadata situation. To be frank, I did not formally diagnose whatever bug I thought might have been going on in #250 (indeed, there may have been no bug at all...), but I did get everything working as it should be which more or less is the goal anyways!
This was definitely necessary because, while writing tests for #150, I realized we had yet to fully deal with altExp metadata in the first place, and this itself was distracting me while writing said tests. So, I went ahead and tackled the metadata situation now via a bit of code refactoring. IMO, this makes our code much more streamlined and modular, and much easier to follow (for me, at least..) compared to what it was before.
This PR consolidates metadata preparation code as follows:
merge_sce_list()
, I check that the expected metadata fields are present. This is handled by the new functioncheck_metadata()
.prepare_merged_metadata()
- this function now handles every aspect of the metadata, which again I think greatly simplifies the script overall and makes it easier to follow.combineCols
, I also create a "MVM" (minimum viable metadata) list to include in the final merged object withextract_metadata_for_altexp()
. All other potential metadata fields will therefore end up get filled in withNULL
.extract_metadata_for_altexp()
could also be an anonymous function in thepurrr::map()
directly?prepare_merged_metadata()
takes a metadata list rather than SCE so that we can use it in situations where there is no metadata, ala for an altexp that didn't originally existAll is tested and passing locally. Enjoy today's🎢 !