Skip to content

Commit d39afa9

Browse files
committed
updated for test data
1 parent 5b587fc commit d39afa9

File tree

1 file changed

+30
-46
lines changed

1 file changed

+30
-46
lines changed

docs/usage.rst

+30-46
Original file line numberDiff line numberDiff line change
@@ -16,19 +16,16 @@ Data files from each group of biological replicates should be placed into a uniq
1616

1717
.. code-block:: console
1818
19-
reads
19+
reads/
2020
├── exp1
2121
│ ├── Dam.fastq.gz
22-
│ ├── HIF1A.fastq.gz
23-
│ └── HIF2A.fastq.gz
22+
│ └── Piwi.fastq.gz
2423
├── exp2
2524
│ ├── Dam.fastq.gz
26-
│ ├── HIF1A.fastq.gz
27-
│ └── HIF2A.fastq.gz
25+
│ └── Piwi.fastq.gz
2826
└── exp3
2927
├── Dam.fastq.gz
30-
├── HIF1A.fastq.gz
31-
└── HIF2A.fastq.gz
28+
└── Piwi.fastq.gz
3229
3330
.. note::
3431

@@ -44,66 +41,55 @@ In some cases the number of non-Dam and Dam samples might not match. In this cas
4441

4542
.. code-block:: console
4643
47-
reads
44+
reads/
4845
├── Dam_1.fastq.gz
49-
├── HIF1A_1.fastq.gz
50-
├── HIF2A_1.fastq.gz
5146
├── Dam_2.fastq.gz
52-
├── HIF1A_2.fastq.gz
53-
├── HIF2A_2.fastq.gz
54-
├── HIF1A_3.fastq.gz
55-
└── HIF2A_3.fastq.gz
47+
├── Piwi_1.fastq.gz
48+
├── Piwi_2.fastq.gz
49+
└── Piwi_3.fastq.gz
5650
5751
When `damid-seq` is run is this case, it will create directories in reads/ for each Dam-only sample matching all non-Dam samples. Symlinks will be created in these directories to the original files in reads/:
5852

5953
.. code-block:: console
6054
61-
reads
55+
reads/
6256
├── Dam_1.fastq.gz
6357
├── Dam_2.fastq.gz
64-
├── HIF1A_1.fastq.gz
65-
├── HIF1A_2.fastq.gz
66-
├── HIF1A_3.fastq.gz
67-
├── HIF2A_1.fastq.gz
68-
├── HIF2A_2.fastq.gz
69-
├── HIF2A_3.fastq.gz
58+
├── Piwi_1.fastq.gz
59+
├── Piwi_2.fastq.gz
60+
├── Piwi_3.fastq.gz
7061
├── repl_1
7162
│ ├── Dam.fastq.gz -> /mnt/4TB_SSD/analyses/DamID/test/reads/Dam_1.fastq.gz
72-
│ ├── HIF1A.fastq.gz -> /mnt/4TB_SSD/analyses/DamID/test/reads/HIF1A_1.fastq.gz
73-
│ └── HIF2A.fastq.gz -> /mnt/4TB_SSD/analyses/DamID/test/reads/HIF2A_1.fastq.gz
63+
│ └── Piwi.fastq.gz -> /mnt/4TB_SSD/analyses/DamID/test/reads/Piwi_1.fastq.gz
7464
├── repl_2
7565
│ ├── Dam.fastq.gz -> /mnt/4TB_SSD/analyses/DamID/test/reads/Dam_2.fastq.gz
76-
│ ├── HIF1A.fastq.gz -> /mnt/4TB_SSD/analyses/DamID/test/reads/HIF1A_1.fastq.gz
77-
│ └── HIF2A.fastq.gz -> /mnt/4TB_SSD/analyses/DamID/test/reads/HIF2A_1.fastq.gz
66+
│ └── Piwi.fastq.gz -> /mnt/4TB_SSD/analyses/DamID/test/reads/Piwi_1.fastq.gz
7867
├── repl_3
7968
│ ├── Dam.fastq.gz -> /mnt/4TB_SSD/analyses/DamID/test/reads/Dam_1.fastq.gz
80-
│ ├── HIF1A.fastq.gz -> /mnt/4TB_SSD/analyses/DamID/test/reads/HIF1A_2.fastq.gz
81-
│ └── HIF2A.fastq.gz -> /mnt/4TB_SSD/analyses/DamID/test/reads/HIF2A_2.fastq.gz
69+
│ └── Piwi.fastq.gz -> /mnt/4TB_SSD/analyses/DamID/test/reads/Piwi_2.fastq.gz
8270
├── repl_4
8371
│ ├── Dam.fastq.gz -> /mnt/4TB_SSD/analyses/DamID/test/reads/Dam_2.fastq.gz
84-
│ ├── HIF1A.fastq.gz -> /mnt/4TB_SSD/analyses/DamID/test/reads/HIF1A_2.fastq.gz
85-
│ └── HIF2A.fastq.gz -> /mnt/4TB_SSD/analyses/DamID/test/reads/HIF2A_2.fastq.gz
72+
│ └── Piwi.fastq.gz -> /mnt/4TB_SSD/analyses/DamID/test/reads/Piwi_2.fastq.gz
8673
├── repl_5
8774
│ ├── Dam.fastq.gz -> /mnt/4TB_SSD/analyses/DamID/test/reads/Dam_1.fastq.gz
88-
│ ├── HIF1A.fastq.gz -> /mnt/4TB_SSD/analyses/DamID/test/reads/HIF1A_3.fastq.gz
89-
│ └── HIF2A.fastq.gz -> /mnt/4TB_SSD/analyses/DamID/test/reads/HIF2A_3.fastq.gz
75+
│ └── Piwi.fastq.gz -> /mnt/4TB_SSD/analyses/DamID/test/reads/Piwi_3.fastq.gz
9076
├── repl_6
9177
│ ├── Dam.fastq.gz -> /mnt/4TB_SSD/analyses/DamID/test/reads/Dam_2.fastq.gz
92-
│ ├── HIF1A.fastq.gz -> /mnt/4TB_SSD/analyses/DamID/test/reads/HIF1A_3.fastq.gz
93-
│ └── HIF2A.fastq.gz -> /mnt/4TB_SSD/analyses/DamID/test/reads/HIF2A_3.fastq.gz
78+
│ └── Piwi.fastq.gz -> /mnt/4TB_SSD/analyses/DamID/test/reads/Piwi_3.fastq.gz
9479
└── sample_matrix.csv
9580
9681
`sample_matrix.csv`` file contains a log of which file was symlinked to which directory:
9782

9883
.. code-block:: console
9984
10085
dir
101-
"['reads/repl_1', 'reads/Dam_1.fastq.gz', 'reads/HIF1A_1.fastq.gz', 'reads/HIF2A_1.fastq.gz']"
102-
"['reads/repl_2', 'reads/Dam_2.fastq.gz', 'reads/HIF1A_1.fastq.gz', 'reads/HIF2A_1.fastq.gz']"
103-
"['reads/repl_3', 'reads/Dam_1.fastq.gz', 'reads/HIF1A_2.fastq.gz', 'reads/HIF2A_2.fastq.gz']"
104-
"['reads/repl_4', 'reads/Dam_2.fastq.gz', 'reads/HIF1A_2.fastq.gz', 'reads/HIF2A_2.fastq.gz']"
105-
"['reads/repl_5', 'reads/Dam_1.fastq.gz', 'reads/HIF1A_3.fastq.gz', 'reads/HIF2A_3.fastq.gz']"
106-
"['reads/repl_6', 'reads/Dam_2.fastq.gz', 'reads/HIF1A_3.fastq.gz', 'reads/HIF2A_3.fastq.gz']"
86+
"['reads/repl_1', 'reads/Dam_1.fastq.gz', 'reads/Piwi_1.fastq.gz']"
87+
"['reads/repl_2', 'reads/Dam_2.fastq.gz', 'reads/Piwi_1.fastq.gz']"
88+
"['reads/repl_3', 'reads/Dam_1.fastq.gz', 'reads/Piwi_2.fastq.gz']"
89+
"['reads/repl_4', 'reads/Dam_2.fastq.gz', 'reads/Piwi_2.fastq.gz']"
90+
"['reads/repl_5', 'reads/Dam_1.fastq.gz', 'reads/Piwi_3.fastq.gz']"
91+
"['reads/repl_6', 'reads/Dam_2.fastq.gz', 'reads/Piwi_3.fastq.gz']"
92+
10793
10894
10995
Sample meta data and analysis settings
@@ -114,22 +100,20 @@ The config/ directory contains `samples.csv` with sample meta data as follows:
114100
+-----------+----------+-----------+
115101
| sample | genotype | treatment |
116102
+===========+==========+===========+
117-
|HIF1A | WT | Hypoxia |
103+
|Piwi | Piwi_ko | None |
118104
+-----------+----------+-----------+
119-
|HIF2A | WT | Hypoxia |
105+
|Dam | WT | None |
120106
+-----------+----------+-----------+
121-
|Dam | WT | Hypoxia |
122-
+-----------+----------+-----------+
123107

124108
`config.yaml` in the same directory contains the settings for the analysis:
125109

126110
.. code-block:: yaml
127111
128-
genome: hg38
112+
genome: dm6
129113
ensembl_genome_build: 110
130114
plasmid_fasta: none # Path to plasmid fasta file with sequences to be removed
131115
fusion_genes:
132-
genes: ENSG00000100644,ENSG00000116016 # Ensembl gene IDs for genes to be masked from the fasta file
116+
genes: FBgn0004872 # Ensembl gene IDs for genes to be masked from the fasta file
133117
feature_to_mask: "exon" # Gene feature to mask from the fasta file (exon or gene)
134118
damidseq_pipeline:
135119
normalization: kde # kde, rpm or rawbins
@@ -210,7 +194,7 @@ A lot of the DamID signal can come from the plasmids that are used to express th
210194

211195
To prevent this, two approaches are available:
212196

213-
1. The genes (Ensembl gene IDs) fused to Dam can be set in config.yaml["fusion_genes] (separated by commas if multiple plasmids are used). This will mask the genomic locations of these genes in the fasta file that will be used to build the Bowtie2 index, hence excluding these regions from the analysis.
197+
1. The genes (Ensembl gene IDs) fused to Dam can be set in config.yaml["fusion_genes] (separated by commas if multiple plasmids are used). This will mask the features set in config > fusion_genes > feature_to_mask (exons or gene) of these genes in the fasta file that will be used to build the Bowtie2 index, hence excluding these regions from the analysis.
214198

215199
.. note::
216200

0 commit comments

Comments
 (0)