Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flip() function for genes containing introns #91

Open
xvtyzn opened this issue Oct 15, 2021 · 3 comments
Open

flip() function for genes containing introns #91

xvtyzn opened this issue Oct 15, 2021 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@xvtyzn
Copy link

xvtyzn commented Oct 15, 2021

Hi again,

I faced the problem of flip function against genes containing intron.
This is from public database.

tmp_gg <- read_gff3("tmp.gff3.txt") %>%
  rename_seq() %>%
  unchop(parent_ids) %>%
  gggenomes() + 
  geom_gene(aes(fill=type, group=parent_ids), size = 5)  +
  scale_fill_brewer(palette = "Set1")

tmp_gg

cd551f25-2564-417c-a067-83cbd0f70345

tmp_gg %>% flip_seqs(4)

cc495749-aaa6-4b36-9582-cb7dfd2af960

The exon does not reverse correctly.

rename_seq function is the following self-made function. It is essentially run to recognize genes that are on the same genome as different bins. It is probably specific to this data and not a common function.

rename_seq <- function(gff_data, gene_list = NA){
  if(!is.na(gene_list)){
    renamed_genes <- gff_data %>%
      dplyr::filter(parent_ids %in% gene_list | 
                      feat_id %in% gene_list) 
  } else {
    renamed_genes <- gff_data %>%
      arrange(seq_id) 
  }
  
  renamed_gff <- renamed_genes %>%
    mutate(seq_ids_psuedo = ifelse(grepl(".m1$",parent_ids),
                                 str_replace(parent_ids, 
                                             pattern=".m1$", replacement=""), 
                                 parent_ids)) %>%
    mutate(seq_id = seq_ids_psuedo) %>%
    unchop(seq_id)
  return(renamed_gff)
}

I guess it may be related to #65, but I don't know the details.
Thanks in advance.

Keigo

tmp.gff3.txt

@thackl
Copy link
Owner

thackl commented Oct 15, 2021

Yep, that is definitely off! Right now, there is good and bad news.

Good news, I'm testing a fix for #65, which seems to work and should make your life easier (will upload soon).

# read_gff3 now can handle Augustus CDS
g0 <- read_gff3("issue_91/tmp.gff3.txt", fix_augustus_cds = TRUE)
# rename_seq() a bit more concise ;)
g1 <- g0 %>% mutate(seq_id = str_remove(map(parent_ids, 1), ".m1$"))

gggenomes(g1) + 
  geom_gene(aes(fill=type), size = 5)  +
  scale_fill_brewer(palette = "Set1")

image

Bad news, fixing flip() needs some more thought. The problem are the exons. All other features can simply be flipped by toggling their strand. This is also true for CDS because I merge all parts from one CDS into one feature internally. However, exons stay independent features, so instead of flipping all exons of one gene together (like all CDS parts), each exon is flipped on its own, which doesn't give the intended outcome. Need to ponder...

@thackl thackl self-assigned this Oct 15, 2021
@thackl thackl added the bug Something isn't working label Oct 15, 2021
@xvtyzn
Copy link
Author

xvtyzn commented Oct 16, 2021

Wow, thanks a lot.

Solving the Augustus problem is definitely great for me.
Thanks for telling me about the str_remove function too!

OK, it’s seem to be difficult to take exon into account...

@biubiu-1
Copy link

biubiu-1 commented Nov 9, 2023

Hope the exon flipping issue got fixed someday. That will be really useful to illustrate the introns of eukaryotic genomes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants