Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch over entire genes... #13

Open
carloartieri opened this issue Apr 3, 2016 · 0 comments
Open

Fetch over entire genes... #13

carloartieri opened this issue Apr 3, 2016 · 0 comments

Comments

@carloartieri
Copy link
Collaborator

I was thinking about it a little more and the idea of 'fetching' reads by SNP isn't going to work. The reason is that SNPs are frequently near enough to one another that reads can span more than one. While iterating over SNPs, this will re-fetch the same reads and lead to double-counting unless you store a list of already-counted SNPs in memory. Regardless, it will be relatively inefficient.

Therefore it makes more sense to accept the list of SNPs and a GTF file, then parse the GTF to fetch only over whole genic regions at a time. Going exon-by-exon would work, as long as each fetch stipulated that reads overlapping the previous exon would be excluded. For example, it's straighforward to create a column in a pandas dataframe, sorted by chrom+position, that notes the end position of the previous element in the table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant