Skip to content

Lab 05: Read Counting

Ryan edited this page Jul 27, 2023 · 4 revisions

Counting Reads

Today we'll use the program HTSeq to count the number of reads mapped to individual genes in our reference genome.

Helpful links:

The general inputs for HTSeq are as follows:

  • --format= specify alignment file type (e.g. "bam")
  • --order= tell htseq the manner in which the alignment file has been sorted (e.g. "pos")
  • --stranded= count reads dependent on forward/reverse strand-specific orientation (defaut: "no")
  • --type= the gff3/gtf feature to count by (e.g. "gene")
  • --idattr= the gff3/gtf column feature used to group the desired type

Note:

HTSeq allows for highly customizable count approaches for many data types. Depending on the library, you may need to adjust these arguments. If in doubt, read the docs.

Steps:

1. Load HTSeq using spack

Make sure htseq-count is functional

htseq-count --help

2. Prepare a counts directory

Change to your analysis directory and run the following:

# go to your personal analysis directory

mkdir 05_count_htseq
cd 05_count_htseq

Link the bam file from your previous STAR mapping results.

ln -s ../03_mapping_star/SRR17062759.Aligned.sortedByCoord.out.bam .

3. Run HTseq

htseq-count \
  --format=bam \
  --order=pos \
  --stranded=no \
  --type=gene \
  --idattr=ID \
  SRR17062759.Aligned.sortedByCoord.out.bam \
  /pickett_shared/teaching/RNASeq_workshop/raw_data/reference/Athaliana_447_Araport11.gene_exons.gff3 \
  > SRR17062759.counts.txt \
  2> SRR17062759.out

Appendix

Scale up to perform counts on all samples:

ln -s ../03_mapping_star/*bam .

for i in *bam ; do
BASE=$( basename ${i%%.Aligned.sortedByCoord.out.bam} )

  htseq-count \
    --format=bam \
    --order=pos \
    --stranded=no \
    --type=gene \
    --idattr=ID \
    $i \
    /pickett_shared/teaching/RNASeq_workshop/raw_data/reference/Athaliana_447_Araport11.gene_exons.gff3 \
    > ${BASE}.counts.txt \
    2> ${BASE}.out
done
Clone this wiki locally