Skip to content

Biocomputing-Research-Group/DeepFilter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepFilter

DeepFilter is a metaproteomics-filtering tool based on deep learning model. It is aimed at improving the improving peptide identifications of microbial communities from a collection of tandem mass spectra. The details are available in https://arxiv.org/pdf/2009.11241.pdf

Setup and installation

Dependency

  • python == 3.6
  • numpy == 1.17.2
  • scikit-learn >= 0.21.3
  • pytorch(gpu version) >= 1.4.0
  • CUDA Version 10.2

Requirement

  • Linux operation system
  • GPU memory should be more than 8 Gb for inference mode otherwise the batchsize should be adjusted
  • GPU memory should be more than 20 Gb for training mode

Toy example of DeepFilter

Post-processing

The toy example given is to help getting a quick start. The files of toy example include:

#!/bin/bash
./inference.sh -in OSU_D2_FASP_Elite_02262014_01.ms2 -s OSU_D2_FASP_Elite_02262014_1.pin -m temp_model/benchmark.pt -o test.rescore.txt

The list of processing files include:

  • test.rescore.txt -> The rescore results for PSMs
  • testidx.txt, testcharge.txt, testpeptide.fasta are processing files to generate isotope distribution
  • test.expEncode.txt -> results of grouping observed spectrum by charge
  • test.theoryEncode.txt -> results of grouping the isotope distribution of peptide sequence by charge and ion type
  • test.feature.txt -> results of 11 extra features extracted from the initial PSM score, the observed spectrum, and the peptide sequence

Protein identification at PSM, peptide and protein level by accepting FDR equals to 1%

PSM leve and peptide level

Execute the filtering.py file as:

python filtering.py test.rescore.txt OSU_D2_FASP_Elite_022252014_1.pin test.psm.txt test.pep.txt

The first arguement is the rescore results file generated by deep learning model inference mode, the second argument is the results from database searching engine (Comet), the third and forth arguments are output files which are defined by users. The output files contain the protein identification results at PSM and peptide level winthin FDR equals to 1% respectively

Protein level

Execute the sipros_peptides_assembling.py file as:

python sipros_peptides_assembling.py

The output file "test.pro.txt" contains the protein identification results at protein level within FDR equals to 1%.

Detail user manual

Pre-processing

  • train_process.py: this script is used for the charge detection of observed mass spectrumm. the first argument is the ms2 file of observed mass spectrum and the second argument is the results after charge detection. The usage:
python train_process.py OSU_D2_FASP_Elite_02262014_01.ms2 expEncode.txt

  • theory_process.py and Sipros_OpenMP: the python script and the binary file are combined togethoer to generate the isotope distribution of the PSM candidates. The usage:
python theory_process.py OSU_D2_FASP_Elite_022252014_1.pin idx.txt charge.txt peptide.fasta feature.txt
./Sipros_OpenMP -i1 idx.txt -i2 charge.txt -i3 peptide.fasta -i4 theoryEncode.txt

  • Label_process.py: this script is to annotated the PSM candidates for training model. The first and second arguments are target and decoy PSMs files which are generated by executing Percolator program, the third argument is the prefix for annotation file, the last argument is the number of files the user want to annotate. The usage:
python Label_process.py percolator_results_target.csv percolator_results_decoy.csv Label 1

Training mode

  • train.py: this script is used to train the DeepFilter model. The first and second arguments are the prefix of the files which contain the processed observed spectrum and the istope distribution. The third and forth arguments are the prefix for the 11 extra feature files and the annotation files. The final argument is the number of file which is used for training. The usage:
python train.py expEncode.txt theoryEncode.txt feature.txt Label 1

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published