"Keep calm, use AI for phages and stop AMR"
PHAGES2050 is a novel Python 3.8+ programming language framework to boost bacteriophage research & therapy and infrastructure in order to achieve the full potential to fight against antimicrobial resistant bacteria within Natural Language Processing (NLP) and Deep Learning.
Our project is about developing a AI-based framework for microbiologists and bioinformaticians who hunt, explore and classify phages. Applying the framework will shorten the duration of computational methods required to match phages with bacteria for specific patient cases. Having such organised framework at hand and freely-available will help develop personalized phage therapy and make it accessible to people worldwide.
Watch the PHAVES #3 talk to learn more.
Framework modules | Usage | Documentation | Installation | Community and Contributions | Have a question? | Found a bug? | Team | Change log | Code of Conduct | License
crawlers
- set of functions responsible for bacteriophages data scraping from different sources (MillardLab, NCBI)
features
- set of functions responsible for nucleotides and proteins feature extraction for Machine Learning classification and deeper analysis
embeddings
- set of pre-trained Embedding models for nucleotides and proteins vectorization
classifiers
- set of pre-trained Machine Learning models dedicated for bacteriophage research
explore
- set of data visualization techniques in 2D or 3D dedicated for deeper bacteriophages exploration
The repository includes numerous examples of using the framework in Jupyter Notebook format (*.ipynb). The most expected ones by the community are listed below:
- MillardLab bacteriophage crawler
- NCBI bacteriophages crawlers (planned):
- taxonomy, host and other expected meta-data;
- complete genome sequences in FASTA format;
- set of genes and proteins in FASTA format;
- Bacteriophage proteins embedding
- Bacteriophage DNA embedding
- Bacteriophage sequence-based biological and biochemical features extraction (planned)
- Bacteriophage structural protein classifier with 95% of accuracy
- Bacteriophage lifecycle classifier including chronic infection (planned)
- Bacteriophage taxonomy classifier (planned)
- Bacteriophage prophage detector and extractor (planned)
- Lysis zones multi-level-classification (in progress)
- Bacteriophages in 3D space based on:
- DNA embedding (planned)
- proteins embedding (planned)
- biological and biochemical features (planned)
- custom user features (planned)
The official documentation is hosted on ReadTheDocs: https://phages2050.readthedocs.io
PHAGES2050 can be installed by running:
pip install phages2050
It requires Python 3.8.0+ to run. You can also use Conda:
conda install -c conda-forge phages2050
If you can't wait for the latest hotness and want to install from GitHub, use:
pip install git+git://github.com/ptynecki/PHAGES2050
If you want to use Bacteriophage proteins vectorizers then remember to install extra package for proteins embedding:
pip install -U "bio-embeddings[all] @ git+https://github.com/sacdallago/bio_embeddings.git"
pip install git+https://github.com/facebookresearch/esm.git
Happy to see you willing to make the PHAGES2050 better. Development on the latest stable version of Python 3+ is preferred. As of this writing it's 3.8. You can use any operating system.
If you're fixing a bug or adding a new feature, add a test with pytest and check the code with Black and mypy. Before adding any large feature, first open an issue for us to discuss the idea with the core devs and community.
Obviously if you have a private question or want to cooperate with us, you can always reach out to us directly via our Phage Directory Slack (channel #PHAGES2050).
Feel free to add a new issue with a respective title and description on the the PHAGES2050 repository. If you already found a solution to your problem, we would be happy to review your pull request.
Core Developers, Domain Experts, Community Managers and Educators who contributing to PHAGES2050:
- Piotr Tynecki
- Yana Minina
- Iwona Świętochowska
- Przemysław Mitura
- Joanna Kazimierczak
- Arkadiusz Guziński
- Bogusław Zimnoch
- Jessica Sacher, PhD
- Shawna McCallin, PhD
- Marie-Agnes Petit, PhD
- Jan Zheng
The log's will become rather long. It moved to its own file.
See CHANGELOG.md.
Everyone interacting in the PHAGES2050 project's development, issue trackers and Slack discussion is expected to follow the Code of Conduct.
The PHAGES2050 package and pre-trained models are released under the under terms of the MIT License.