Skip to content

Commit

Permalink
moving to /scratch
Browse files Browse the repository at this point in the history
  • Loading branch information
linsalrob committed Nov 11, 2024
1 parent d734601 commit 6a8fbf3
Showing 1 changed file with 30 additions and 13 deletions.
43 changes: 30 additions & 13 deletions Workshops/COMBINE_WA_2024.md
Original file line number Diff line number Diff line change
Expand Up @@ -459,16 +459,32 @@ Sample | R1 or R2 | Number of sequences | Total length | Shortest | Longest | N5
[788707_20181126_S](../Datasets/CF/788707_20181126_S_R2.fastq.gz) | R2 | 125,000 | 37,500,000 | 300 | 300 | 300 | 300 | 300


## BEFORE WE GO ON!

We are running out of space on `/home`, where we logged in, so now we are going to work on `/scratch`.

Here's the hack to make life easy for you:

```
ln -s /scratch/courses01/$USER scratch
```

Now you see a directory called `scratch`. Moving forwards, we are going to do all our work in there.

```
cd scratch
```

(Note the difference between `/scratch` and `scratch`)


## Cross-assembly

We are going to run a cross assembly on this data to get more contigs. I've staged the data on `/scratch/courses01/cf_data`.

Here is the code that we need to run this assembly

```
mkdir -p megahit_assembled/
ALLR1=""; ALLR2="";
for R1 in $(find /scratch/courses01/cf_data/ -name \*R1\*); do
Expand All @@ -480,9 +496,8 @@ done;
ALLR1=$(echo $ALLR1 | sed -e 's/,$//');
ALLR2=$(echo $ALLR2 | sed -e 's/,$//');
megahit -1 $ALLR1 -2 $ALLR2 -o /scratch/courses01/$USER/cross_assembly -t 16
megahit -1 $ALLR1 -2 $ALLR2 -o megahit_assembly -t 16
ln -s /scratch/courses01/$USER/cross_assembly megahit_assembled/cross_assembly
```

**Note:**
Expand All @@ -495,14 +510,16 @@ ln -s /scratch/courses01/$USER/cross_assembly megahit_assembled/cross_assembly
We are going to use `minimap`, like we did beore. However, here is a little bit of code that can run `minimap` on all of the samples!

```
mkdir -p /scratch/courses01/$USER/bam_contigs
for R1 in $(find reads/ -name \*R1\* -printf "%f\n"); do
R2=${R1/R1/R2};
BAM=${R1/_R1.fastq.gz/.contigs.bam};
minimap2 --split-prefix=tmp$$ -t 8 -a -xsr megahit_assembled/cross_assembly/final.contigs.fa reads/$R1 reads/$R2 | samtools view -bh | samtools sort -o /scratch/courses01/$USER/bam_contigs/$BAM;
READDIR=/scratch/courses01/cf_data/
mkdir -p bam_contigs
for R1 in $(find $READDIR -name \*R1\* -printf "%f\n"); do
R2=${R1/R1/R2};
BAM=${R1/_R1.fastq.gz/.contigs.bam};
minimap2 --split-prefix=tmp$$ -t 8 -a -xsr megahit_assembled/cross_assembly/final.contigs.fa $READDIR/$R1 $READDIR/$R2 | samtools view -bh | samtools sort -o bam_contigs/$BAM;
done
find /scratch/courses01/$USER/bam_contigs -type f -exec samtools index {} \;
ln -s /scratch/courses01/$USER/bam_contigs bam_contigs
find bam_contigs -type f -exec samtools index {} \;
```

## Generating a depth profile
Expand All @@ -516,6 +533,7 @@ samtools coverage bam_contigs/788707_20171213_S.contigs.bam | less
Now we iterate over all the files and get the first column, the contig name, and the 7<sup>th</sup> column which has the mean depth for that contig.

```
mkdir bam_contigs_tsv
for BAM in $(find bam_contigs -type f -name \*bam -printf "%f\n"); do
OUT=${BAM/.contigs.bam/.tsv};
samtools coverage bam_contigs/$BAM | cut -f 1,7 > bam_contigs_tsv/$OUT;
Expand All @@ -528,5 +546,4 @@ We have created an [example Jupyter notebook](Workshop_MAG_demo.ipynb) so you ca

We are going to move the data to [Google Colab](https://colab.research.google.com/) to analyse the data and identify contigs that co-occur across multiple samples.

[You can find the example notebook here](Workshop_MAG_demo.ipynb)

0 comments on commit 6a8fbf3

Please sign in to comment.