-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No taxIDs for proteomes #36
Comments
Dear @bheimbu, Thank you for reaching out! If you just obtained a proteome from a species that is not yet described in the NCBI Taxonomy, you can "borrow" the taxID of another species from the same genus that doesn't have any data on the NR. Let me give you an example: Let's say that you have a proteome from a yet undescribed species of the genus Arabidopsis. Since it doesn't have an NCBI Taxonomy ID, you could borrow the taxID of Arabidopsis lyrata x Arabidopsis halleri, which doesn't have any published proteins on the NR. This way you could still make use of all the proteomic data that is available within the genus and find species-specific genes. The results will be the same as if you had a taxID for your undescribed species, just bear in mind that all the results will have the name Arabidopsis lyrata x Arabidopsis halleri which you can change later. I hope this helps, let me know if you need anything else. Cheers, |
Hi @josuebarrera, that should be actually no problem for me. I'm working with oribatid mites, that means there is not much protein data available anyway. I'll try your suggestions though. Anyway, can I ask you an unrelated question here or should I open a new thread? Cheers Bastian |
Dear @bheimbu, I'm glad this workaround works for you! |
Awesome. First, some background info: I'm working with oribatid mites, as mentioned before. I've found some orthologs for some very interesting proteins throughout Oribatida, and presume that these genes have been acquired via horizontal gene transfer from either bacteria or fungi; probably a long, long time ago since oribatids are ancient old. So I have used these orthologs (protein sequences) and ran genEra. And most of which are assigned to "cellular organisms" but some are probably "contamination or HGT". I'm honest to you, I don't get the idea behind the later in detail. What does it exactly mean if homologs can only be found eg. from cellular organism to Opisthokonta and then again in the species of interest? Does this imply a probable HGT event from an organism within or below Opisthokonta to my species of interest? And to the "cellular organisms": Does it mean that these protein-coding genes were already present in the LUCA? So these genes are ancient old and have been neo-functionalized? I really need some help here. Find attached my gene_ages.tsv. Sry, for all these questions, I'm really new to phylostratigraphy. Cheers Bastian |
Dear @bheimbu, Sorry for the delayed response. You made a very good inquiry. We developed a "taxonomic representativeness" score in
Where:
Therefore, a taxonomic representativeness score of 100 means that Given your results, I would probably establish a parameter Cheers, |
Dear @josuebarrera, many thanks for your reply. I'm relieved that I'm not completely wrong here ;) Thanks for the clarification, I'm wondering whether I could settle for a different value for '-l' perhaps (lower than 65), when I use the option '-a'? I could also potentially try the option '-s' as I have a phylogenetic tree including all my oribatid mite species of interest. This should work then right? Cheers Bastian |
Just a quick question: Must the phylogenetic tree be based on the same protein data used for the analysis of gene-founder events? Or can it be any tree that includes all species of interest, I highly doubt it though?! Cheers Bastian |
Dear @bheimbu, You don't need the phylogeny of your mites to add the proteomes to your analysis. As you mentioned, you can use the option Once you finish your initial Hope this helps, let me know if you need anything else! Cheers, |
Dear @josuebarrera, so there would be no additional benefit of using Cheers Bastian |
Dear @bheimbu, The parameter The other option to add a phylogenetic tree is using parameter Hope this info helps! Cheers, |
Dear @josuebarrera,
great software, many thanks for this. I have a question related to unpublished proteomes without taxIDs? Is it possible to use the taxID of the genus instead or is it simply not possible to use such proteomes?
Cheers Bastian
The text was updated successfully, but these errors were encountered: