-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mapping proteins to functions: method and benchmark development #1
Comments
This hackathon project will be merged with #5 by @rababerladuseladim:
|
AbstractMetaproteomics is the analysis of proteins in samples composed of multiple organisms. One major use case is the investigation of the functional composition of a sample. A multitude of functional annotation databases are available, which vary strongly in level of quality, price and accessibility. Multiple tools can connect identified sequences with functional information (e.g. Unipept, Prophane, MetaGOmics). One of these tools, Unipept, was recently expanded with a basic functional analysis pipeline. Functional annotations are directly linked to proteins, for which taxonomic information is also available. This link allows researchers to reveal which functions are performed by which organisms and vice-versa. By expanding the Unipept functional analysis pipeline with support for metabolic pathways, we can further increase the insight of researchers into the complex processes taking place in an environment. To achieve this, we can choose out of several functional annotation and pathway databases. To determine the best way forward, we need to overcome a couple of challenges: (1) ideally build a prototype for each data source and (2) benchmark each of these prototypes against a golden standard database. Due to a lack of data with known ground-truth at the functional level, no such golden standard exists at this point, making it very hard to assess the performance of each pipeline and compare tools with each other. This project proposal aims at developing a concept on how the ideal gold standard dataset should be composed and generate it accordingly. We could then use it to evaluate several tools and potential annotation sources for Unipept. Work plan
Technical detailsCurrently, Unipept consists of a set of bash scripts and Java tools to extract information from UniProt. The Unipept framework is created using Ruby on Rails, but its APIs can be queried from every programming language and return standard JSON. The Unipept visualisations tools are written in JavaScript and Typescript. To allow us to construct a golden standard benchmarking database, we will use reference databases such as SwissProt from which datasets are derived. We will benchmark Unipept, Prophane and metaGOmics on our new database. Contact information: Bart Mesuere - Ghent University (Belgium) - [email protected] |
Abstract
Unipept is an ecosystem of tools for the taxonomic and functional analysis of (meta)proteomics datasets. The Unipept project aims to be an easy-to-use and very accessible tool by providing users with a web-based application. A command-line interface (CLI) and API are also provided, allowing users to process more samples and increasing the analysis throughput. Unipept started with a taxonomic analysis pipeline which was recently expanded with a new functional analysis pipeline with support for GO-terms and EC-numbers. Functional annotations are directly linked to proteins, for which taxonomic information is also available. This link allows researchers to reveal which functions are performed by which organisms and vice-versa. This project proposal aims at improving Unipept’s functional analysis pipeline with annotations for metabolic pathways. If wanted, we could also take a more generic approach and explore how to best link different functional annotations, how to map them to the available pathway data sources, and how to visualise them. The results of this can be useful for other projects as well.
Work plan
A first step would be to identify and score the available functional annotations and how they could be combined. We already have experience with GO terms, EC numbers and InterPro annotations within Unipept.
Next, the same needs to be done for metabolic pathway resources. These should each be scored on data quality, coverage, availability and ease-of-use. We will start by building a small prototype for each of the different resources and benchmark them. Candidates that we would like to include in our comparison are Reactome, KEGG, MetaCyc, and BioCyc.
After identifying suitable candidates, we can create a higher-level proof of concept workflow which starts from a list of peptides or proteins and ends with a list of interesting pathways. The Unipept API can be used to query for some of these annotations.
Technical details
Currently, we have a set of bash scripts and Java tools to extract information from UniProt. The Unipept framework is created using Ruby on Rails, but its APIs can be queried from every programming language and return standard JSON. Our current visualisations tools are written in JavaScript and Typescript.
Contact information
Bart Mesuere - Ghent University (Belgium) - [email protected]
Pieter Verschaffelt - Ghent University (Belgium) - [email protected]
The text was updated successfully, but these errors were encountered: