exscan.nf - An extended sequence scanner

Important

This project is still under development. After first release, a license and Zenodo DOI will be added.

Introduction

exscan.nf is a bioinformatics pipeline designed to scan DNA or protein sequences for key features of interest, with a focus on aiding in genomic annotation and comparative genomics studies.

The pipeline is implemented using Nextflow, and performs the following steps:

Translation of ORFs (if input is DNA)
Uses seqkit2 to translate sequences into all possible open reading frames (ORFs).
Profile HMM Search
Queries each translated ORF or raw protein sequence against a profile HMM database using hmmscan.
Perform different operations on each query result. Among others, operations include:
- Filtering the results by e-value, score, and coverage.
- Selecting the best scoring hit for each translated ORF or full protein sequence.
- Comparing hits with a GFF file to retrain the features intersecting with the hits.
- Writing profile HMM hits as FASTA, GFF, CSV...

All operations are handled via python, biopython, jq, and bedtools.

Usage

You can run the pipeline using:

nextflow run main.nf --fasta sequences.fasta --hmmdb hmmdb.hmm

Or alternatively:

nextflow run main.nf -params-file param_files/params.yaml

Warning

Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

Pipeline output

#TODO To see the results of an example test run with a full size dataset refer to

For more details about the output files and reports, please refer to the output documentation.

Credits

exscan.nf was originally written by Joan LLuis Pons Ramon at the Station Biologique de Roscoff.

We thank the following people for their extensive assistance in the development of this pipeline:

#TODO

This work was supported by the HORIZON–MSCA-2022-DN program of the European Commission under the Grant Agreement No 101120280.

License

Still to be decided.

References

Wei Shen, Botond Sipos, and Liuyang Zhao, “SeqKit2: A Swiss Army Knife for Sequence and Alignment Processing,” Imeta 3, no. 3 (June 2024): e191, https://doi.org/10.1002/imt2.191.
Sean R. Eddy, “Accelerated Profile HMM Searches,” ed. William R. Pearson, Plos Computational Biology 7, no. 10 (October 2011): e1002195, https://doi.org/10.1371/journal.pcbi.1002195.
Peter J. A. Cock et al., “Biopython: Freely Available Python Tools for Computational Molecular Biology and Bioinformatics,” Bioinformatics 25, no. 11 (June 2009): 1422–23, https://doi.org/10.1093/bioinformatics/btp163.
Aaron R. Quinlan and Ira M. Hall, “BEDTools: A Flexible Suite of Utilities for Comparing Genomic Features,” Bioinformatics 26, no. 6 (March 2010): 841–42, https://doi.org/10.1093/bioinformatics/btq033.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
bin		bin
conf		conf
data/tests		data/tests
docs		docs
modules/local		modules/local
subworkflows		subworkflows
tests		tests
workflows		workflows
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config
nf-test.config		nf-test.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

exscan.nf - An extended sequence scanner

Introduction

Usage

Pipeline output

Credits

License

References

About

Releases

Packages

Languages

jllpons/exscan

Folders and files

Latest commit

History

Repository files navigation

exscan.nf - An extended sequence scanner

Introduction

Usage

Pipeline output

Credits

License

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages