diff --git a/README.md b/README.md index d232caf..fca0832 100644 --- a/README.md +++ b/README.md @@ -144,7 +144,7 @@ Because VCF files vary widely in format, **SNPGenie now requires** users to spec As usual, you will want to make sure to maintain the VCF file's features, such as TAB(\t)-delimited columns. Unlike some other formats, the allele frequency in VCF is a decimal. #### A Note on Reverse Complement ('-' Strand) Records -Many large genomes have coding products on both strands. In this case, SNPGenie must be run twice: once for the '+' strand, and once for the '—' strand. This requires FASTA, GTF, and SNP report input for the '-' strand. I provide a script for converting input files to their reverse complement strand in the [Additional Scripts](#additional-scripts) below. Note that, regardless of the original SNP report format, the reverse complement SNP report is in a CLC-like format that SNPGenie will recognize. For both runs, the GTF should include all products for both strands, with products on the strand being analyzed labeled '+' and coordinates defined with respect to the beginning of that strand's FASTA sequence. Also note that a GTF file containing *only* '-' strand records will not run; SNPGenie does calculations only for the products on the current + strand, using the '-' strand products only to determine the number of overlapping reading frames for each variant. +Many large genomes have coding products on both strands. In this case, SNPGenie must be run twice: once for the '+' strand, and once for the '-' strand. This requires FASTA, GTF, and SNP report input for the '-' strand. I provide a script for converting input files to their reverse complement strand in the [Additional Scripts](#additional-scripts) below. Note that, regardless of the original SNP report format, the reverse complement SNP report is in a CLC-like format that SNPGenie will recognize. For both runs, the GTF should include all products for both strands, with products on the strand being analyzed labeled '+' and coordinates defined with respect to the beginning of that strand's FASTA sequence. Also note that a GTF file containing *only* '-' strand records will not run; SNPGenie does calculations only for the products on the current + strand, using the '-' strand products only to determine the number of overlapping reading frames for each variant. ### Options