-
Notifications
You must be signed in to change notification settings - Fork 0
Lab 05: Read Counting
Ryan edited this page Jul 27, 2023
·
4 revisions
Today we'll use the program HTSeq to count the number of reads mapped to individual genes in our reference genome.
Helpful links:
The general inputs for HTSeq are as follows:
- --format= specify alignment file type (e.g. "bam")
- --order= tell htseq the manner in which the alignment file has been sorted (e.g. "pos")
- --stranded= count reads dependent on forward/reverse strand-specific orientation (defaut: "no")
- --type= the gff3/gtf feature to count by (e.g. "gene")
- --idattr= the gff3/gtf column feature used to group the desired type
HTSeq allows for highly customizable count approaches for many data types. Depending on the library, you may need to adjust these arguments. If in doubt, read the docs.
spack load [email protected]%[email protected]
Make sure htseq-count is functional
htseq-count --help
Change to your analysis directory and run the following:
# go to your personal analysis directory
mkdir 05_count_htseq
cd 05_count_htseq
Link the bam file from your previous STAR mapping results.
ln -s ../03_mapping_star/SRR17062759.Aligned.sortedByCoord.out.bam .
htseq-count \
--format=bam \
--order=pos \
--stranded=no \
--type=gene \
--idattr=ID \
SRR17062759.Aligned.sortedByCoord.out.bam \
/pickett_shared/teaching/RNASeq_workshop/raw_data/reference/Athaliana_447_Araport11.gene_exons.gff3 \
> SRR17062759.counts.txt \
2> SRR17062759.out
Scale up to perform counts on all samples:
ln -s ../03_mapping_star/*bam .
for i in *bam ; do
BASE=$( basename ${i%%.Aligned.sortedByCoord.out.bam} )
htseq-count \
--format=bam \
--order=pos \
--stranded=no \
--type=gene \
--idattr=ID \
$i \
/pickett_shared/teaching/RNASeq_workshop/raw_data/reference/Athaliana_447_Araport11.gene_exons.gff3 \
> ${BASE}.counts.txt \
2> ${BASE}.out
done