MetaPUF

An approach to integrate metagenomics, metatranscriptomics and metaproteomics data found in public resources such as MGnify (for metagenomics/metatranscriptomics) and the PRIDE database (for metaproteomics). When these omics techniques are applied to the same sample, their integration offers new opportunities to understand the structure (metagenome) and functional expression (metatranscriptome and metaproteome) of the microbiome.

Installation

You need a working installation of Snakemake. Then:

git clone <this-repo>

The pipeline uses conda environments to manage dependencies, which are handled automatically if you run snakemake with the --use-conda flag.

It also relies on some tools (ThermoRawFileParser, SearchGui and PeptideShaker) which do not have conda packages or docker images available for the versions we used. These tools are downloaded on-the-fly by snakemake, so you do not need to install them separately.

Example usage (test data-set)

There is a small test-data set, using a few assemblies from MGnify and two RAW files from PRIDE. To fetch the (~GB size) RAW files, which are too big for this git repository:

./test-data/pride/fetch-pride-test-data.sh

This downloads two RAW files into test-data/pride/.

Then:

conda activate snakemake # (assuming you installed snakemake with conda, into an env called snakemake)
cd MetaPUF
snakemake --cores 4 --use-conda

This will run the pipeline on the small dataset, and put results into ../test-run.

Real usage (configuration)

Edit the config/config.proteomics.yaml and sample_info.csv files to point the pipeline at real data. sample_info.csv is the mapping of MGnify to PRIDE datasets, and in the config parameters.input_dir and parameters.raw_dir refer to the MGnify and PRIDE data folders respectively.

Tips for running Snakemake

You can run a dry-run to check for any syntax errors

 Snakemake  -np

To run the workflow

 Snakemake --cores 4 --use-conda

Using LSF on an HPC cluster:

bsub -n 4 -R "rusage[mem=4096]" -J metapuf -u $USER -o job.log -e job.err snakemake --cores 4 --use-conda

Tips: IF the pipeline got collapsed during running, you can always try to run a dry-run Snakemake -np first to check how many rules have been successful executed, and if you are sure that some files are generated correctly, you can use snakemake --cleanup-metadata <filenames> to skip these files to be re-generated. However, sometimes snakemake --cleanup-metadata <filenames> doesn't work, you can also try to manually delete the .snakemake/incomplete directory.

Distributing and visualising the results

This repository also contains a utility for packaging the pipeline's output GFF files as RO-Crate, suitable for distribution and visualisation on the MGnify website. See utils/package_as_rocrate/ for full details.

Development installation

git clone https://github.com/PRIDE-reanalysis/MetaPUF.git
cd MetaPUF
conda activate snakemake  # or another conda/venv if you prefer
pip install ".[dev,docs]"
pre-commit install

This installs the development requirements, and installs the pre-commit hooks which format the code correctly while commiting changes. You can also manually format the code using black .. It also installs mkdocs, which is used to build the documentation.

Editing docs

Change the markdown files in the docs/ folder. Then mkdocs serve to view the documentation site locally.

Core contributors and collaborators

Code of Conduct

As part of our efforts toward delivering open and inclusive science, we follow the Contributor Covenant Code of Conduct for Open Source Projects.

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
config		config
docs		docs
images		images
test-data		test-data
utils/package_as_rocrate		utils/package_as_rocrate
workflow		workflow
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.yamllint		.yamllint
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MetaPUF

Installation

Example usage (test data-set)

Real usage (configuration)

Tips for running Snakemake

Distributing and visualising the results

Development installation

Editing docs

Core contributors and collaborators

Code of Conduct

How to cite

Copyright notice

About

Releases

Packages

Contributors 4

Languages

License

PRIDE-reanalysis/MetaPUF

Folders and files

Latest commit

History

Repository files navigation

MetaPUF

Installation

Example usage (test data-set)

Real usage (configuration)

Tips for running Snakemake

Distributing and visualising the results

Development installation

Editing docs

Core contributors and collaborators

Code of Conduct

How to cite

Copyright notice

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages