Home

vcf2tsv – algorithm

How does the vcf2tsv function work? In contrast to the perl implementation, this vcf2tsv conserves all INFO and FORMAT tags.

Basically, it first scans the input file to get a unique list of all the INFO and FORMAT tags that are present in it (let’s call these all-info-tags and all-format-tags). The sorted INFO tags will become part of the header. As for the format tags: they are interleaved with each sample name to become part of the header as well. Then to actually process the file, it goes through each line and:

creates the bit of the output line that concerns the INFO field
- creates a map of the INFO field (e.g. “DP=17;GN=BRCA2;CN=INTRONIC” becomes {"DP" “17”, “GN” “BRCA2”, “CN” "INTRONIC})
- goes through all-info-tags and gets the value from this map; an empty string if that tag is not present in the INFO string.
creates the bit of the output line that concerns the FORMAT and sample fields. For each individual:
- creates a map by interleaving the split FORMAT field with the sample data
- goes through all-format-tags and gets the value from this map; an empty string if that tag is not present in the sample data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

vcf2tsv – algorithm

Clone this wiki locally