Skip to content

lhallee/CUF-ORF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

Code for https://doi.org/10.1101/2022.07.20.500846

Title: Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life

Abstract: In this study, we investigate how an organism’s codon usage bias can serve as a predictor and classifier of various genomic and evolutionary traits across the domains of life. We perform secondary analysis of existing genetic datasets to build several AI/machine learning models. When trained on codon usage patterns of nearly 13,000 organisms, our models accurately predict the organelle of origin and taxonomic identity of nucleotide samples. We extend our analysis to identify the most influential codons for phylogenetic prediction with a custom feature ranking ensemble. Our results suggest that the genetic code can be utilized to train accurate classifiers of taxonomic and phylogenetic features. We then apply this classification framework to open reading frame (ORF) detection. Our statistical model assesses all possible ORFs in a nucleotide sample and rejects or deems them plausible based on the codon usage distribution. Our dataset and analyses are made publicly available on GitHub and the UCI ML Repository to facilitate open-source reproducibility and community engagement.

Link to Dr. Bohdan B. Khomtchouk's Github: https://github.com/Bohdan-Khomtchouk/codon-usage

Link to dataset in UCI machine learning repository: https://archive-beta.ics.uci.edu/dataset/577/codon+usage

Please cite as follows: Hallee, L., Khomtchouk, B.B. Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life. Sci Rep 13, 2088 (2023). https://doi.org/10.1038/s41598-023-28965-7

Bibtex: @misc{Hallee2023, author = {Hallee, L., Khomtchouk, B.B.}, title = {Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life | Scientific Reports}, howpublished = {\url{https://www.nature.com/articles/s41598-023-28965-7#citeas}}, month = {February}, year = {2023}, note = {} }

About

CUF Classification and ORF Identification

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published