Genotyping in an HPC environment

Preamble

What is this?

This repo is to provide example code for a training module on running a genotyping and genotype likelihoods pipeline in an HPC environment. It also works other places. Go here for varying versions of the code used in Atlantic Salmon and Lumpfish. Go here for info on how these tools will run in a cloud virtual machine. The code used here is to show one way of running a genotyping pipeline with a specific set of tools. There are many different ways to approach genotyping, but the general steps involve some version of: cleaning up raw reads, aligning them to a genome, updating and improving those alignments a bit, and calling variant sites of different kinds.. Often these approaches are cited as the "GATK best practices", but they're definitely usually not, unless it's a human genomics project. This paper remains my go-to manuscript for getting a better handle on the general ideas underlying most of these steps.

What isn't this!?

The point of this repo and training is not to turn you into an expert bioinformatcian, or software engineer. I definitely don't expect that of myself, and this code won't get you there. The goal here is build some general familiarty with genomic data processing, genotyping tools, and submitting jobs to a cluster. There is likely a tradeoff in time investment with developing elegant code and generating biologically and scientifically meaningful results:

But building these skills can be rewarding and fun. So, my only advice here is try to pick up new things while developing code that looks good enough to you to get the information from the natural world that you care about.

Misc

Also some misc. links that cover tools used that I have found helpful:

The most useful (and basically only) training resource I've used.

GATK parallelism and multithreading

GATK best practices in practice

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
scripts		scripts
README.md		README.md
env_building.md		env_building.md
genotyping_and_likelihoods.md		genotyping_and_likelihoods.md
preprocessing.md		preprocessing.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Genotyping in an HPC environment

Preamble

What is this?

What isn't this!?

Misc

About

Uh oh!

Releases

Packages

Languages

TonyKess/genotyping_hpc

Folders and files

Latest commit

History

Repository files navigation

Genotyping in an HPC environment

Preamble

What is this?

What isn't this!?

Misc

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages