Skip to content

Commit

Permalink
Updated '-' strand formatting
Browse files Browse the repository at this point in the history
  • Loading branch information
singing-scientist committed Jan 11, 2020
1 parent 2539f79 commit c72f4f0
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ Because VCF files vary widely in format, **SNPGenie now requires** users to spec
As usual, you will want to make sure to maintain the VCF file's features, such as TAB(\t)-delimited columns. Unlike some other formats, the allele frequency in VCF is a decimal.

#### <a name="revcom"></a>A Note on Reverse Complement ('-' Strand) Records
Many large genomes have coding products on both strands. In this case, SNPGenie must be run twice: once for the '+' strand, and once for the '' strand. This requires FASTA, GTF, and SNP report input for the '-' strand. I provide a script for converting input files to their reverse complement strand in the [Additional Scripts](#additional-scripts) below. Note that, regardless of the original SNP report format, the reverse complement SNP report is in a CLC-like format that SNPGenie will recognize. For both runs, the GTF should include all products for both strands, with products on the strand being analyzed labeled '+' and coordinates defined with respect to the beginning of that strand's FASTA sequence. Also note that a GTF file containing *only* '-' strand records will not run; SNPGenie does calculations only for the products on the current + strand, using the '-' strand products only to determine the number of overlapping reading frames for each variant.
Many large genomes have coding products on both strands. In this case, SNPGenie must be run twice: once for the '+' strand, and once for the '-' strand. This requires FASTA, GTF, and SNP report input for the '-' strand. I provide a script for converting input files to their reverse complement strand in the [Additional Scripts](#additional-scripts) below. Note that, regardless of the original SNP report format, the reverse complement SNP report is in a CLC-like format that SNPGenie will recognize. For both runs, the GTF should include all products for both strands, with products on the strand being analyzed labeled '+' and coordinates defined with respect to the beginning of that strand's FASTA sequence. Also note that a GTF file containing *only* '-' strand records will not run; SNPGenie does calculations only for the products on the current + strand, using the '-' strand products only to determine the number of overlapping reading frames for each variant.

### <a name="options"></a>Options

Expand Down

0 comments on commit c72f4f0

Please sign in to comment.