You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+53-4
Original file line number
Diff line number
Diff line change
@@ -17,6 +17,12 @@ Simply type "perl certain_script.pl" or "perl certain_script.pl -h" for details
17
17
### fasta_process.pl
18
18
> Query, extract and processing fasta sequences.
19
19
20
+
**Note:** some options could be combined but have priority orders, for example extract and sort could be run in a single step, while sort and extract will not work; break it into two or more steps under these situations.
**Note:** some options could be combined but have priority orders, for example extract and sort could be run in a single step, while sort and extract will not work; break it into two or more steps under these situations.
85
-
86
100
87
101
88
102
### convert_fastq_quality.pl
@@ -125,6 +139,11 @@ However, since the VCF format generated from different caller varies, this scrip
* Summary of results generated from GATK DiagnoseTargets (https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_diagnostics_diagnosetargets_DiagnoseTargets.php)
#### Use vcf_process.pl to clustering markers (genetically linked regions)
214
+
1) Only bi-allelic loci is supported while analysis sequence context, multi-alleles need to be
215
+
breaked first;
216
+
2) Extension here is different for SNPs and INDELs, e.g. upstream 5bp and downstream 5bp for SNPs,
217
+
while only downstream 10bp for INDELs, thus the INDELs are assumed to be already left aligned
190
218
219
+
220
+
#### Clustering variants
221
+
222
+
Use vcf_process.pl to clustering markers (genetically linked regions).
191
223
The clustering function is used to identify genome blocks through certain type of markers. This was done by fisrt search for the reliable seeds (segments with consecutive markers of the same type and pass the criteria, the "seeding" stage), then merge adjacent seeds with same type to form blocks (the "extension" stage), the boundary between blocks of different type was determined according to the markers present between two blocks or use the middle point while no more markers present.
192
224
The "seeding-and-extension" algorithm was borrowed from "Wijnker, E. et al. The genomic landscape of meiotic crossovers and gene conversions in Arabidopsis thaliana. eLife 2, e01426 (2013)", which used for identify recombinat blocks.
193
225
@@ -219,6 +251,23 @@ The "seeding-and-extension" algorithm was borrowed from "Wijnker, E. et al. The
--primary-tag HC --secondary-tag UG --intersect-tag "UG+HC" > combined.vcf
266
+
267
+
* Combine two vcf files according to the "CHROM" and "POS" fields, but if the "ALT" field differ, write the "ALT" info of secondary file into "SDIFF" field
0 commit comments