Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong annotation (distal intergenic instead of downstream) with annotatePeak #210

Open
beryljones opened this issue Nov 22, 2022 · 0 comments

Comments

@beryljones
Copy link

beryljones commented Nov 22, 2022

Hello,

Thank you very much for this package! I seem to have an error in the annotatePeak output that I cannot find an existing issue about. I am using R version 4.1.2 and ChIPseeker_1.30.3. Here is the rest of the relevant session_info():

#R version 4.1.2 (2021-11-01)
#Platform: x86_64-w64-mingw32/x64 (64-bit)
#Running under: Windows 10 x64 (build 19044)

#Matrix products: default

#locale
LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
LC_NUMERIC=C LC_TIME=English_United States.1252

#attached base packages:
stats4 stats graphics grDevices utils datasets methods base

#other attached packages:
ggimage_0.3.1 ggplot2_3.4.0 ggupset_0.3.0 GenomicFeatures_1.46.5 AnnotationDbi_1.56.2
Biobase_2.54.0 GenomicRanges_1.46.1 GenomeInfoDb_1.30.0 IRanges_2.28.0 S4Vectors_0.32.2
BiocGenerics_0.40.0 ChIPseeker_1.30.3

I have a custom genome/annotation, so I first make a txdb object from my gff3:

library(GenomicFeatures)
test_txdb <- makeTxDbFromGFF(file="chr3_subset.gff3",
dataSource="chr3_subset.gff3",
organism="Lasioglossum albipes")

Here are a list of regions I want to annotate (small subset to illustrate the issue):
LALB_chr_3 2627151 2628080
LALB_chr_3 2659532 2660013
LALB_chr_3 2664362 2666431

And here are relevant proximal genes (just showing the "gene" entry from the gff3):
LALB_chr_3 maker gene 2623697 2633465 . - . ID=LALB_04121;Name=LALB_04121
LALB_chr_3 maker gene 2660123 2661437 . - . ID=LALB_04122;Name=LALB_04122

I would like to annotate a region as downstream if it is within 5000 bp of the end of the gene (on either strand). I found elsewhere on this site that I can use the options(ChIPseeker.downstreamDistance = 5000) to accomplish this. Here is what I run:

options(ChIPseeker.downstreamDistance = 5000)
peakAnno <- annotatePeak("test_chr3_regions.bed",
level="gene",
sameStrand=FALSE,
ignoreDownstream=FALSE,
ignoreUpstream=FALSE,
assignGenomicAnnotation=TRUE,
addFlankGeneInfo=TRUE,
overlap="all",
flankDistance = 10000,
genomicAnnotationPriority = c("Promoter", "Intron", "5UTR", "3UTR", "Exon", "Downstream", "Intergenic"),
tssRegion=c(-10000, 0),TxDb=test_txdb, verbose=TRUE)

Here is the output:
seqnames start end width strand annotation geneChr geneStart geneEnd
1 LALB_chr_3 2627152 2628080 929 * Intron (LALB_04121-RA/LALB_04121, intron 3 of 5) 1 2623697 2633465
2 LALB_chr_3 2659533 2660013 481 * Distal Intergenic 1 2660123 2661437
3 LALB_chr_3 2664363 2666431 2069 * Promoter (2-3kb) 1 2660123 2661437
geneLength geneStrand geneId distanceToTSS flank_geneIds flank_gene_distances
1 9769 2 LALB_04121 5385 LALB_04121 0
2 1315 2 LALB_04122 1424 LALB_04122 1424
3 1315 2 LALB_04122 -2926 LALB_04122 -2926

The second region is annotated as "Distal Intergenic", but it is only 110bp downstream from the gene LALB_04122. Even the default downstream distance (300, I think) should be greater than 110bp, so the options(ChIPseeker.downstreamDistance = 5000) should not matter here, although I do want to include regions that are within 5kb.

Happy to send files if needed or answer any other questions. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant