Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add options for more genotype formats #3

Open
janxkoci opened this issue Feb 15, 2023 · 1 comment
Open

add options for more genotype formats #3

janxkoci opened this issue Feb 15, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@janxkoci
Copy link
Owner

The current vcfGTcount.gawk script can be expanded to report not just the basic GT summaries, but also e.g. translated genotypes (TGT in the terminology of bcftools) or even IUPAC version (or IUPACGT in bcftools). For example:

  • -t = count translated genotypes
  • -i = count IUPAC-formated genotypes
  • -g = count numeric-style genotypes (default)

This would be handled by a function that gets called after extracting a genotype, using some if checking.

function translate(gt, ref, alt, iupac)
{
    gsub(/0/, ref, gt)
    gsub(/1/, alt, gt)
    if (iupac == 1)
        gt = iupacdict[gt] # needs a dict of iupac codes
    return gt
}

It can be handled by a single function, but maybe more efficient would be to have two functions, so that the if (iupac == 1) is called once rather than on every genotype.

@janxkoci janxkoci added the enhancement New feature or request label Feb 15, 2023
@janxkoci
Copy link
Owner Author

janxkoci commented Apr 14, 2023

After giving it some thought, I realized most of the classic approaches would hurt performance, even if the current functionality is desired. This is because of the extra if statements being called too often.

But since the script already uses gawk features, I can solve it with indirect function calls. Basically, I'd define multiple functions and select the one to be used at run time. All I need is to define the flags as above and then assign the appropriate function.

One thing to consider later would be combining multiple flags to get multiple stats in the output. But that can wait.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant