Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatch between gene expression and features #2277

Open
TomVuod opened this issue Mar 3, 2025 · 1 comment
Open

Mismatch between gene expression and features #2277

TomVuod opened this issue Mar 3, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@TomVuod
Copy link

TomVuod commented Mar 3, 2025

There is a bug in ArrowRead.R file which produces a mismatch between rows in the gene expression matrix and features in rowData.

matchI <- match(i, idxRows, nomatch = 0)
idxI <- which(matchI > 0)
i <- i[idxI]
j <- j[idxI]
i <- matchI[idxI]

The last line replaces the row indices with the corresponding indices of the sorted genes. This behavior has no effect as long as the genes are already sorted, meaning that the idx vector is identical to seq_along(idx). However, if the genes are unsorted, this operation alters the row positions of at least some elements in the assembled matrix, leading to inconsistencies. As a result, the new row indices no longer align with the rows in rowData, which is retrieved independently.
To reproduce this bug it's sufficient to pass to addGeneExpressionMatrix a SummarizedExperiment with genes unsorted relative to their position in the genome. After saving the ArchR project and loading it from the Arrow files the gene expression doesn't match the gene names as rows where
rowData(GeneExpressionMatrix)$idx != seq_along(rowData(GeneExpressionMatrix)$idx).

@TomVuod TomVuod added the bug Something isn't working label Mar 3, 2025
@immanuelazn
Copy link
Collaborator

Hi @TomVuod! Thanks for using ArchR! Please make sure that your post belongs in the Issues section. Only bugs and error reports belong in the Issues section. Usage questions and feature requests should be posted in the Discussions section, not in Issues.

If you are getting an error, it is likely due to something specific to your dataset, usage, or computational environment, all of which are extremely challenging to troubleshoot. As such, we require reproducible examples (preferably using the tutorial dataset) from users who want assistance. If you cannot reproduce your error, we will not be able to help.
Before going through the work of making a reproducible example, search the previous Issues, Discussions, function definitions, or the ArchR manual and you will likely find the answers you are looking for.
If your post does not contain a reproducible example, it is unlikely to receive a response.

In addition to a reproducible example, you must do the following things before we help you, unless your original post already contained this information:
1. If you've encountered an error, have you already searched previous Issues to make sure that this hasn't already been solved?
2. Did you post your log file? If not, add it now.
3. Remove any screenshots that contain text and instead copy and paste the text using markdown's codeblock syntax (three consecutive backticks). You can do this by editing your original post.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants