Skip to content

Latest commit

 

History

History
38 lines (25 loc) · 1.13 KB

bioinformatics.md

File metadata and controls

38 lines (25 loc) · 1.13 KB

Bioinformatics

This is a cheat sheet for bioinformatics command line programs.

Bioinformatics, Yay!!

HOMER motif finding

Use HOMER for finding motifs in the genome given bed files.

For RNA

Build a command using Python

# Search for short motifs, lenghts 4,5,6,7 with up to 1 mismatch
n_processors = 4
homer_flags = '-rna -len 4,5,6,7 -mset vertebrates -mis 1 -p {}'.format(n_processors)
findMotifsGenome = '/home/yeo-lab/software/homer/bin/findMotifsGenome.pl'
command = '{} {} hg19 {} -bg {} {}'.format(
        findMotifsGenome, bedfile, out_dir, background, homer_flags)

The final command looks like this:

findMotifsGenome.pl peaks.bed out_dir hg19 -bg background.bed -rna -len 4,5,6,7 -mset vertebrates -mis 1 -p 4

Subsample a fastq file

Use seqtk (installable via bioconda) to subsample a fastq.gz file down to 1000 reads, using a random seed of 0.

mkdir subsampled
for F in $(ls *.gz) ;do echo $F ; seqtk sample -s 0 $F 1000 | gzip -c - > subsampled/$F ; done