-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
readVizgen()
and readXenium()
#12
Conversation
update from `devel` branch
..major update for `readVizgen()` and few helper functions. It supports old and new Vizgen/Merscope output
..added helper function `callMeta()` to return a df of SFE object
Thank you for the PR! I can change the magrittr pipe to the base pipe so I don't introduce another dependency. The PR will go into the Bioc 3.18 version of Voyager, and R 4.3 is required anyway so the base pipe should be available. Please put the Seurat conversion function in the |
Sounds great, thanks! 👍 |
It seems that I think adding Let me know what changes I would need to do here. |
..some functions are masked from other loaded libs, this seems to break stuff. Using explicit, eg `dplyr::slice()` than `importFrom` -> r-lib/roxygen2#217
While I use Tidyverse in my analyses, I usually stick to base R as far as I can in packages to avoid additional dependencies whenever possible. Unless using Tidyverse has significant performance improvement and that's important to users, I would prefer to use base R. To not to make you spend too much time on this, I can translate the code to base R. |
Since dplyr and magrittr are already commonly used, it's most likely fine, though it doesn't seem too difficult to translate your code to base R, so I'll still try translating. |
Ok, sounds good. Thanks for translating to base R. |
Can you give me a schematic for the directory structure standards for the older and newest Vizgen output you mentioned? This should also include file name conventions. If the column names in the cell metadata and detected transcripts are different between versions, please provide those as well. This is so that I can make toy test data and write unit tests for your function. |
Sure, yeah. Since users might have older data and new data. Here are examples of the output dirs:
For
For
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use_cellpose
is no longer needed
As for upcoming PRs, I can do more base R style such that you don't need to do much of additional translation from pipe magrittr tidyverse style of coding? It's just easier to follow/understand the code that way, especially where there are a number of operations. Generally I try to avoid nested coding style for that reason. |
I put it back for this reason: when running code in a separate R session, such as rendering the package website with |
I can try dusting off my C++ to stream the ginormous transcript spots file line by line without loading the whole thing into memory to construct the raster or a parquet file. In principle it can be parallelized. This way it's possible to never load the whole thing into memory. If all we need is plotting a very zoomed in region, then it shouldn't be necessary to load the whole thing. If making a heatmap of transcript density for one gene, it shouldn't be necessary to load the other genes. |
Yeah, if only for visualization, then it shouldn’t load everything/all genes in to memory. |
Probably you already heard |
I still want something that supports geometric operations, so I would prefer either sf or geoarrow or terra. One could preform point process analyses with spatstat, but that requires a spatstat's own class. |
I'm still working on this pull request and I really tend to overthink. I noted that you would filter the transcript spot coordinates by z-plane, but these coordinates are actually 3D. There're also images at each z-plane. It's just that at least for the example datasets from Vizgen, the cell segmentation is the same in all z-planes. Is that always the case? Since you work with MERFISH data a lot, I wonder if you find it useful to visualize transcript spots from all z-planes or only one z-plane. I think it could be useful to have the option to load data from all z-planes. Also, how useful do you think the cell IDs for transcript spots are? I think already having the transcript assignment to cells can help with analyses on transcript localization within each cell, but when the cell segmentation is only in one z-plane, these analyses can't get very far. If cell IDs are useful, then maybe the transcript spots should be loaded into I'm not prioritizing ways to not load everything into memory for now since I have other features to work on for Bioc 3.18. |
making sure that count matrix and transcripts have the same genes
image metadata `globalMetadata(img)` now has updated `SizeX` and `SizeY`, which correspond to cropped full resolution image cell id from `nucSeg(sfe)` are used after cropping
currently SFE Xenium count matrix is is a class
should we keep them the same, either |
Yes, please keep them the same. Having DelayedMatrix is helpful for larger datasets so you don't have to load everything into memory but it makes computation slower. |
always keep cell ids in |
sounds good, also |
Oh, hold on, I don't mean making them all DelayedMatrix. What I mean is to keep whatever class originally read in. In the case of Xenium, |
ok, then we keep as it is read in |
I think keeping |
`sample_id` are now in row/colGeomentries if `min_phred = NULL` all features/genes will be present, else SFE will have only features present in rowGeomentries
..make sure all features are present when ->`min_phred = NULL`
Hi
as discussed in #3 I did a major update for
readVizgen()
to handle older and the newest Vizgen/Merscope output data, as well as.filter_polygons()
FALSE
as default, not sure what would be the best?filter_counts = FALSE
# if to keep cells with counts > 0, this will subsetpolys
as well.add_molecules = FALSE
# if to add molecule/transcript coords torowGeometries
use_bboxes = FALSE
# if no segmentation output present, usecell_metadata
to make bounding boxes insteadI tested it on few old-new datasets we have here, all went fine on my side.
Only plotting including the image_id has a small issue with ->
Voyager:::geom_spi_rgb(., max_col_value = 255)
, I will open a separate issue for Voyager on this.I use
tidyverse
and pipe operator frommagrittr
a lot when implementing, hope that's ok.I will work further on converter from/to SFE <-> Seurat, is
utils.R
the proper place to add it?This is the run example: