Skip to content

4. Creation of Mapping Targets

George Pacheco edited this page Oct 16, 2022 · 1 revision

We created several .bed files aiming to look at our data in different ways:

Creates a .bed file containing all the EcoT22I cutsites present in the improved Pigeon Genome:
$SCRIPTS/appz/p5-bpwrapper/bin/bioseq --restrict-coord EcoT22I ~/data/Pigeons/Reference/DanishTumbler_Dovetail_ReRun.fasta > ~/data/Pigeons/Reference/PBGP_FinalRun.EcoT22I.bed
Number of LOCI: 390,028
Extends the .bed file created above by adding LOCI IDs and differentiating strands:
awk 'BEGIN{cnt=0;OFS="\t"} {print $1,$2,$2,"RS_FPGP_"cnt"p\t0\t+";print $1,$3,$3,"RS_FPGP_"cnt++"m\t0\t-"}' ~/data/Pigeons/Reference/PBGP_FinalRun.EcoT22I.bed | slopBed -s -l 0 -r 91 -g ~/data/Pigeons/Reference/DanishTumbler_Dovetail_ReRun.fasta.fai | sortBed > ~/data/Pigeons/Reference/PBGP_FinalRun.EcoT22I_Extended.bed
Number of LOCI: 780,056
Further modifies the .bed file created above so it can be used when calculating coverage taking into account overlapping regions:
bedtools merge -i ~/data/Pigeons/Reference/PBGP_FinalRun.EcoT22I_Extended.bed -c 4 -o distinct > ~/data/Pigeons/Reference/PBGP_FinalRun.EcoT22I_Extended_Merged.bed
Creates a .pos file based on this last .bed file:
awk '{print $1"\t"($2+1)"\t"$3}' ~/data/Pigeons/Reference/PBGP_FinalRun.EcoT22I_Extended_Merged.bed > ~/data/Pigeons/Reference/PBGP_FinalRun.EcoT22I_Extended_Merged.pos
Calculates which is the percentage of the Pigeon Genome that is covered by this .pos file:
cat ~/data/Pigeons/Reference/PBGP_FinalRun.EcoT22I_Extended_Merged.pos | awk '{sum+=($3-$2)*100/1111661097} END {print sum"%"}'
Percentage Coverage: 5.90195%

Clone this wiki locally