- Open MobaXterm ("Click on the Windows icon on the lower left corner, search for MobaXterm, click on the MobaXterm icon") Type the follwing command to login to your account (username@ipaddress)
- Go to the terminal and type
pwd
. - Make sure you are in
/scratch/ciw/ciw01
(ciw01 to ciw30 depending on your user)
- Now, type
mkdir output
into the terminal. - Type
ls
and pressEnter
. You should be able to see 3 filescommands.log
,path_files.dat
,run_script.sh
and 1 directoryoutput
which you just created.
- Now open the
run_script.sh
file from the left hand side panel of your screen. (Right click on it and selectOpen with default text editor
) - You will see something like this.
#!/usr/bin/bash
#BSUB -J example # Job name
#BSUB -n 1 # No. of threads
#BSUB -o output/output.log # Output file for standard output
#BSUB -e output/error.log # Output file for standard error
source path_files.dat
- Don't change anything. Just copy and paste the following command onto the new line below source path_files.dat
fastqc ${sequences}/24NGS775-B1_S55_R1_001.fastq.gz ${sequences}/24NGS775-B1_S55_R2_001.fastq.gz -o output/
- Now your file should look like this
#!/usr/bin/bash
#BSUB -J example # Job name
#BSUB -n 1 # No. of threads
#BSUB -o output/output.log # Output file for standard output
#BSUB -e output/error.log # Output file for standard error
source path_files.dat
fastqc ${sequences}/24NGS775-B1_S55_R1_001.fastq.gz ${sequences}/24NGS775-B1_S55_R2_001.fastq.gz -o output/
- Now press
Ctr + S
to save the file. Close it. - Wait for the progress bar on the lower left hand corner of your screen to finish.
- Then type the following command to view the script on the terminal to make sure you have modified it correctly.
cat run_script.sh
- This will print the whole script on terminal itself. Check the script again.
- Now run the script by typing the following onto the terminal.
bsub < run_script.sh
bsub is a command used to submit jobs to the LSF (Load Sharing Facility) job scheduler. LSF is a workload management platform used in high-performance computing (HPC) environments to manage, schedule, and distribute computing tasks across a cluster of machines. When you use bsub, you're asking the system to submit a job to the LSF queue, where it will be scheduled and run on an available node (a machine in the cluster).
- Once, you submit the job by typing
bsub < run_script.sh
, you can check it's status by simply typing
bjobs
You will see one of the following on the terminal.
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
64141 ciw03 RUN normal hpc cn3 example Sep 19 09:53
OR
No unfinished job found
-
If you see the first output, that means your job is still running and you should wait for it to finish. Type
bsub
again after some time. -
If you see the second output
No unfinished job found
, it means your job is finished. -
You can check the output of the command you just ran by typing
ls -ltrh output/
Similarly, do this for every step below and check if the output files are generated by typing ls -ltrh output/
We'll do it for one more step.
- Open the
run_script.sh
with default text editor and remove the last command you copy pasted into the script which wasfastqc
. - Now, your script will look like this.
#!/usr/bin/bash
#BSUB -J example # Job name
#BSUB -n 1 # No. of threads
#BSUB -o output/output.log # Output file for standard output
#BSUB -e output/error.log # Output file for standard error
source path_files.dat
- Now, copy paste the following command as it is onto the new line below
source path_files.dat
like you did earlier.
trimmomatic PE ${sequences}/24NGS775-B1_S55_R1_001.fastq.gz ${sequences}/24NGS775-B1_S55_R2_001.fastq.gz -baseout output/24NGS775-B1.fq.gz ILLUMINACLIP:$adapters:2:30:10:2:keepBothReads LEADING:3 SLIDINGWINDOW:4:15 MINLEN:40
- Your script should look like this.
#!/usr/bin/bash
#BSUB -J example # Job name
#BSUB -n 1 # No. of threads
#BSUB -o output/output.log # Output file for standard output
#BSUB -e output/error.log # Output file for standard error
source path_files.dat
trimmomatic PE ${sequences}/24NGS775-B1_S55_R1_001.fastq.gz ${sequences}/24NGS775-B1_S55_R2_001.fastq.gz -baseout output/24NGS775-B1.fq.gz ILLUMINACLIP:$adapters:2:30:10:2:keepBothReads LEADING:3 SLIDINGWINDOW:4:15 MINLEN:40
- Now press
Ctr + S
to save the file. Close it. - Wait for the progress bar on the lower left hand corner of your screen to finish.
- Then type the following command to view the script on the terminal to make sure you have modified it correctly.
cat run_script.sh
- Check the modified script again.
- Now, Run the script by typing
bsub < run_script.sh
Check the job's status by typing
bjobs
If it is finished, check the output files in the output
directory by typing
ls -ltrh output/
Do this for every step.
flash output/24NGS775-B1_1P.fq.gz output/24NGS775-B1_2P.fq.gz --cap-mismatch-quals -O -M 250 -o output/24NGS775-B1
bwa mem -R "@RG\tID:AML\tPL:ILLUMINA\tLB:LIB-MIPS\tSM:24NGS775-B1\tPI:200" -M -t 20 ${genome} output/24NGS775-B1.extendedFrags.fastq > output/24NGS775-B1.sam
samtools view -b output/24NGS775-B1.sam > output/24NGS775-B1.bam
samtools sort output/24NGS775-B1.bam > output/24NGS775-B1.sorted.bam
samtools index output/24NGS775-B1.sorted.bam > output/24NGS775-B1.sorted.bam.bai
Details of this step can be found here : https://gatk.broadinstitute.org/hc/en-us/articles/360035535912-Data-pre-processing-for-variant-discovery
java -Xmx8G -jar ${GATK38_path} -T RealignerTargetCreator -R ${genome} -nt 10 -I output/24NGS775-B1.sorted.bam --known ${site1} -o output/24NGS775-B1.intervals
java -Xmx8G -jar ${GATK38_path} -T IndelRealigner -R ${genome} -I output/24NGS775-B1.sorted.bam -known ${site1} --targetIntervals output/24NGS775-B1.intervals -o output/24NGS775-B1.realigned.bam
java -Xmx8G -jar ${GATK38_path} -T BaseRecalibrator -R ${genome} -I output/24NGS775-B1.realigned.bam -knownSites ${site2} -knownSites ${site3} -maxCycle 600 -o output/24NGS775-B1.recal_data.table
java -Xmx8G -jar ${GATK38_path} -T PrintReads -R ${genome} -I output/24NGS775-B1.realigned.bam --BQSR output/24NGS775-B1.recal_data.table -o output/24NGS775-B1.final.bam
bedtools bamtobed -i output/24NGS775-B1.final.bam > output/24NGS775-B1.final.bed
bedtools coverage -counts -a ${bedfile}.bed -b output/24NGS775-B1.final.bed > output/24NGS775-B1.counts.bed
java -Xmx10G -jar ${GATK38_path} -T MuTect2 -R ${genome} -I:tumor output/24NGS775-B1.final.bam -o output/24NGS775-B1_mutect.vcf -L ${bedfile}.bed
convert2annovar.pl -format vcf4 output/24NGS775-B1_mutect.vcf --outfile output/24NGS775-B1.avinput --withzyg --includeinfo
table_annovar.pl output/24NGS775-B1.avinput --out output/24NGS775-B1_final --remove --protocol refGene,cosmic84,exac03 --operation g,f,f --buildver hg19 --nastring '-1' --otherinfo --csvout ${database}
python3 ${formatMutect_script_path} output/24NGS775-B1_final.hg19_multianno.csv 24NGS775-B1 output/
python3 ${KDMdb_script_path} output/24NGS775-B1_mutect.csv output/ 24NGS775-B1