Calculating ArchR Gene Score Matrix for bulk epigenetic data #2175
Unanswered
Al-Murphy
asked this question in
Questions / Documentation
Replies: 1 comment
-
Just as an update on this, I ran my same approach (second one from above) on a lot more cell types and histone marks and found just as poor correlations:
Note that some of these marks are repressive so I would take a high negative correlation as a good result too. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm trying to calculate a gene score matrix for bulk ChIP-Seq histone mark data (e.g. H3K27ac from Roadmap/ENCODE). I believe these gene score could be good predictors of expression more generally than only being used to call marker genes. However, I'm having issues formatting the bulk ChIP-Seq data to get ArchR to accept it:
I start with bedGraph data containing the mapped count of reads:
This looks pretty similar to the single-cell, fragments file ArchR happily accepts:
The only difference is the cell barcode is missing. To get around this I created random bar codes to make 500 cells for my 50 million reads (col V5):
Any less than 500, and archR fails, I believe throwing errors about the number of cells. With this approach, I can create an arrow file (
createArrowFiles
) and get the gene score matrix. I then pseudobulk (tested mean and sum) the scores across the 500 'cells' to get one score per gene.The issue is the correlation between these and the true expression for the same cell type (bulk RNA-Seq) is pretty much random:
-0.050094
. I know there are issues with my approach but I would have thought this would have been quite a bit higher even so.So my question is how to improve this? I also tried duplicating the data with the 500 'cells' above and one more cell with all the 50 million reads. I had to increase the
maxFrags
parameter but did getcreateArrowFiles
to run. I then filter the gene score matrix to just the cell with all of the reads. However the correlation between these and the true expression was equally poor-0.054114
.Any advice or help on this would be great!
Beta Was this translation helpful? Give feedback.
All reactions