Skip to content

Commit

Permalink
repo cleanup part 1
Browse files Browse the repository at this point in the history
  • Loading branch information
Maria-ISU committed Aug 8, 2021
1 parent 3c38725 commit fd69e4c
Show file tree
Hide file tree
Showing 6 changed files with 75 additions and 28 deletions.
3 changes: 0 additions & 3 deletions 00a_Metadata.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,6 @@
|Barcode ID| Sample ID|
| -- | -- |
|B13| 6 |
|B14| N22 |
|B15| N29 |
|B16| 53 |
|B17| 55 |
|B18 | 59 |
|B19 | 88 |
35 changes: 35 additions & 0 deletions 02b_scripts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Scripts used to run the analysis

## Quallity control

Running all barcodes in a loop

```bash
for n in 4 5 6 7 8 9 ; do nextflow run /work/gif/Maryam/Programfiles/nanoQCtrim/main.nf --fastqs ../00-RawData/barcode1$n/combine-barcode1$n.fastq --outdir output-barcode1$n --options '-middle_threshold 100' -profile nova,singularity; done
```

where
* main.nf
```bash

```

## Assembly with Flye genome assembler

Runing flye

```bash
for n in 4 5 6 7 8 9 ; do
flye --nano-raw /work/gif/Maryam/projects/Zengyi-2021-ScheffersomycesStipitis/01-QC/output-barcode$n/trimmedReads/FAO68114_pass_barcode$n_598c81f2_0_adaptersRemoved.fastq --out-dir out_barcode$n
--genome-size 15m --threads 30 -i 4
```
## Allignmnet
### Alliging the insert onto the assemblies with minimap2
```bash
for n in 4 5 6 7 8 9 ; do
minimap2 -aLx map-ont /work/gif/Maryam/projects/Zengyi-2021-ScheffersomycesStipitis/00-RawData/inserts-all.fasta /work/gif/Maryam/projects/Zengyi-2021-ScheffersomycesStipitis/02-fly/out_
flye_b$n/assembly.fasta > aln-b$n\.sam " >> minimap2_$n
```
24 changes: 23 additions & 1 deletion 03_Results.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,31 @@
# Results

### Assembly

#### BUSCOs

| Insert | Complete BUSCOs (C) | Complete and single-copy BUSCOs (S) | Complete and duplicated BUSCOs (D) | Fragmented BUSCOs (F) | Missing BUSCOs (M) | Total BUSCO group searched |
| ---| ---| ---| ---| ---| ---| ---|
| 6 | 687 (90.7%) | 685 (90.4%)| 2 (0.3%)| 6 (0.8%) | 65 (8.5%) | 758 |
|53| 681 (89.8%) | 680 (89.7%) | 1 (0.1%) | 7 (0.9%) | 70 (9.3%) | 758 |
| 59 | 685 (90.3%) | 681 (89.8%) | 4 (0.5%) | 7 (0.9%) | 66 (8.8%) | 758 |
| 88 | 678 (89.4%) | 677 (89.3%) | 1 (0.1%) | 6 (0.8%) | 74 (9.8%) | 758 |

#### Assembly stats

| insert | # of scaffolds| Total size of scaffolds | Longest scaffold | shortest scaffold | Number of scaffolds > 1M nt| Number of scaffolds > 100K nt| Number of scaffolds > 10K nt | N50 | L50 | N90 | L90 |
|---|---| ---|---|---|---|---|---|---|---|---| --- |
|6 | 11| 16022515| 3579715| 16004|8| 10 | 11 | 1894960 | 3 | 1111997 | 7 |
| 53| 10 | 15787988 | 4587006| 4261 | 6| 7 | 8| 3523743| 2 | 1113568 | 6|
| 59 | 12 | 15994518 | 3515593 | 60343 |7| 9 | 12 | 1894562 | 3 | 1112789 | 7 |
| 88 | 23 | 15767318 | 3532437 | 2149 |7| 9 | 13 |1755085 | 3 | 1109753 | 7 |




### Rearrangement

We did not observe rearrangement in the assembled strains:
We did not observe rearrangement in the assembled strains:
[dotplots in eps format](Notebook_Maryam/png/dotplots.eps)

![dotplots](Notebook_Maryam/png/dotplots.png)
Expand Down
10 changes: 2 additions & 8 deletions Notebook_Maryam/00-RawData.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
* Nova


Csiva has downloaded the raw data as a tar file to ``
Siva has downloaded the raw data as a tar file.
I tried :

```bash
Expand All @@ -17,10 +17,4 @@ cd /work/gif/Maryam/projects/Zengyi-2021-ScheffersomycesStipitis/00-RawData/
ln -s /work/gif/archiveNova/Zengyi_2021/1_36509_6_GridIONX5_1845.tar .
tar -xvf 1_36509_6_GridIONX5_1845.tar
```
```
tar -xvf 1_36509_6_GridIONX5_1845.tar```
19 changes: 7 additions & 12 deletions Notebook_Maryam/01-QC.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,14 +36,15 @@ nextflow run ../../../Programfiles/nanoQCtrim --fastqs ../00-RawData/barcode13/

```

I asked Andrew and there were several problems with the way I wanted to run it !
I asked Andrew and there were several problems with the way I wanted to run it ( Note: I could run nextflow pipeline successfully eventually) !
1. I need to run `/work/gif/Maryam/Programfiles/nanoQCtrim/main.nf` !
2. I need `nova` profile so the jobs are submitted and `singularity` to load the singularity module and pull the containers.
3. Andrew suggested that I concatenate the fastq files for each system and run but everything was so quick and easy that I decided to run as it is.


The correct command :
```nextflow run /work/gif/Maryam/Programfiles/nanoQCtrim/main.nf --fastqs ../00-RawData/barcode13/FAO68114_pass_barcode13_598c81f2_*.fastq --outdir output-barcode13 -profile singularity,nova
```
nextflow run /work/gif/Maryam/Programfiles/nanoQCtrim/main.nf --fastqs ../00-RawData/barcode13/FAO68114_pass_barcode13_598c81f2_*.fastq --outdir output-barcode13 -profile singularity,nova
```

This was finished very fast ( less than a minute). So I decided to just submit all of them jobs from a loop file:
Expand Down Expand Up @@ -94,13 +95,7 @@ cat *.fastq > conbine-barcode19.fastq
grep "@" conbine-barcode19.fastq | wc
80876 444863 465828992
```
There is a problem with downpore. Here is what Andrew thinks about the issue :

*
*
*

So Andrew made some changes in the pipeline.
There is a problem with downpore. Andrew made some changes in the pipeline (Note: The changes worked and I could run the pipeline).

##### barcode 13

Expand Down Expand Up @@ -134,9 +129,9 @@ Next read to be deleted: `read 21822`.
I still get the error message after deleting that read:
`2021/01/20 15:07:16 Splitting read 21822 into: 0 - 903 and 1131 - 1048
2021/01/20 15:07:16 CGTACCGGTGCTGTCACACGAGTATGGA
panic: runtime error: slice bounds out of range
panic: runtime error: slice bounds out of range
`
I am not sure why. I am going to leave this for now and try complete match for finding adaptors. This way it is possible that most of the adaptors are not trimmed but they might not influence the assembly. After assembly is done we can confirm that the adaptors are not in the assembly later.
I am not sure why. I am going to leave this for now and try complete match for finding adaptors. This way it is possible that most of the adaptors are not trimmed but they might not influence the assembly. After assembly is done we can confirm that the adaptors are not in the assembly later (Note: We later confirmed that the adaptors were not detected in the assembly).

For now:

Expand All @@ -150,4 +145,4 @@ running nanoQCtrim for the rest of barcodes in a for loop:
for n in 4 5 6 7 8 9 ; do nextflow run /work/gif/Maryam/Programfiles/nanoQCtrim/main.nf --fastqs ../00-RawData/barcode1$n/combine-barcode1$n.fastq --outdir output-barcode1$n --options '-middle_threshold 100' -profile nova,singularity; done
```

Because some of the reads were split in the middle (barcodes were in the middle) I used higher identity matching threshold of 100% (default is 85%) to detect adaptors. This way only adaptors that have a perfect match are trimmed and therefore they will not be split int he middle. We might have a few adaptors that are not trimmed this way but we still think that will not effect the quality of assembly.
Because some of the reads were split in the middle (barcodes were in the middle) I used higher identity matching threshold of 100% (default is 85%) to detect adaptors. This way only adaptors that have a perfect match are trimmed and therefore they will not be split in the middle. We might have a few adaptors that are not trimmed this way but we still think that will not effect the quality of assembly (Note: As mentioned before we confirm that adaptores are not present in the assembly).
12 changes: 8 additions & 4 deletions Notebook_Maryam/02-Flye.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ awk '(NR%4==1){print}' FAO68114_pass_barcode13_598c81f2_0_adaptersRemoved.fastq

`@Barcode-13-(forward)_26a4210c-f10d-46ae-b345-ea93f90e14ff`

Ok, it is only one. I did check and the reads are different. Only the ids are the same. It seems that one is left one is right. I am not sure what it means but I decided to rename them instead of deleting the read!
Ok, it is only one. I did check and the reads are different. Only the ids are the same. It seems that one is left and one is right. I decided to rename them instead of deleting the read!

```bash
less "FAO68114_pass_barcode13_598c81f2_0_adaptersRemoved.fastq" | tr '@Barcode-13-(forward)_26a4210c-f10d-46ae-b345-ea93f90e14ff runid=598c81f298f24cf974f62bf9136b2ea7dfc37f41 read=541 ch=195 start_time=2020-12-22T20:40:40Z flow_cell_id=FAO68114 protocol_group_id=1585 sample_id=no_sample barcode=barcode13 barcode_alias=barcode13_(left)' '@Barcode-13-(forward)_26a4210c-f10d-46ae-b345-ea93f90e14ff_l runid=598c81f298f24cf974f62bf9136b2ea7dfc37f41 read=541 ch=195 start_time=2020-12-22T20:40:40Z flow_cell_id=FAO68114 protocol_group_id=1585 sample_id=no_sample barcode=barcode13 barcode_alias=barcode13_(left)'
Expand Down Expand Up @@ -175,24 +175,27 @@ module load singularity
` FATAL: While making image from oci registry: while building SIF from layers: conveyor failed to get: Error initializing source oci:/home/msayadi/.singularity/cache/oci:43e181f6c95958eb5c639526bfc3330844a9b45c021d551e27dd1906c51f218c: Error writing blob: open /home/msayadi/.singularity/cache/oci/oci-put-blob578989333: disk quota exceeded
`
I tried to redirect the cache to a temp directory but didn't work..
I tried to redirect the cache to a temp directory but didn't work (Note: After making some space available it did work).
```bash
SINGULARITY_LOCALCACHEDIR="/work/gif/Maryam/dot-files/"
```
still had the same problem. SO I decided to delete the content of cache directory to free up space!
still had the same problem. So I decided to delete the content of cache directory to free up space!
```bash
du -hs /home/msayadi/
```
`
.singularity/cache/oci
2.1G /home/msayadi/.singularity/cache/oci
`
```bash
rm -rf /home/msayadi/.singularity/cache/oci/*
```
Ok, it works and now I have a list of species I can choose for Bosco.
I chose `eukaryota_odb10` from the list. One group down is `fungi_odb10`.
Expand Down Expand Up @@ -247,6 +250,7 @@ or `156443` reads for barcode13 after trimming. number of reads before trimming
renaming the reads:
```
grep "Barcode-13-(forward)_e70940d2-cd0a-461e-b4f1-13c2e8bfa03c" combine-barcode13_adaptersRemoved.fastq
@Barcode-13-(forward)_e70940d2-cd0a-461e-b4f1-13c2e8bfa03c runid=598c81f298f24cf974f62bf9136b2ea7dfc37f41 read=11326 ch=222 start_time=2020-12-23T07:36:02Z flow_cell_id=FAO68114 protocol_group_id=1585 sample_id=no_sample barcode=barcode13 barcode_alias=barcode13_(left)
@Barcode-13-(forward)_e70940d2-cd0a-461e-b4f1-13c2e8bfa03c runid=598c81f298f24cf974f62bf9136b2ea7dfc37f41 read=11326 ch=222 start_time=2020-12-23T07:36:02Z flow_cell_id=FAO68114 protocol_group_id=1585 sample_id=no_sample barcode=barcode13 barcode_alias=barcode13_(right)
```
Expand Down Expand Up @@ -308,7 +312,7 @@ I am going to add line number to all the ids to make them all uniq.
paste - - - - < combine-barcode13_adaptersRemoved.fastq | awk 'BEGIN{NF = OFS = "\t"} { print $1"_"NR,$11,$12,$13}' | tr '\t' '\n' > combine-barcode13_uniqIDs.fastq
```
And try flye again. FLye os working.
And try flye again. Flye is running.
Repeat that for all the barcodes:
Expand Down

0 comments on commit fd69e4c

Please sign in to comment.