Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct binsize calculation #70

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Drmirdeep
Copy link

Binsize calculation is based on the gff input file and thus needs to add 1 for the region size. Since this was missing
an offset error accumulated with the number of bins used and the end of a region was never been used for calculation.
E.g. A region with 60 bins would lack the last 60bp of a region due to the offset error accumulation.

Binsize calculation is based on the gff input file and thus needs to add 1 for the region size. Since this was missing
an offset error accumulated with the number of bins used and the end of a region was never been used for calculation.
E.g. A region with 60 bins would lack the last 60bp of a region due to the offset error accumulation.
@jdimatteo
Copy link
Member

Sorry, this looks like a serious error.

@Drmirdeep Do you have time to add a unit test for this change ? Unit tests are here: https://github.com/BradnerLab/pipeline/blob/master/bamliquidator_internal/bamliquidatorbatch/test.py

@charlesylin does someone from your team have time to review these changes?

@jdimatteo
Copy link
Member

I'm sorry for my long delay and that I can't devote more time to this right now. I'll follow up with tests and rigorous review in a week or two if nobody else has time.

@jdimatteo
Copy link
Member

jdimatteo commented Dec 8, 2019

@Drmirdeep I discussed briefly with @charlesylin . I think you might be using a different coordinate system than the one assumed by bamliquidator. Can you please review the below article and advise me? As I understand it, most bamliquidator users expect the current behavior which uses the UCSC coordinate system, so maybe it would make sense to add an option to override the default coordinate system. I'd appreciate your recommendation here.

http://genome.ucsc.edu/blog/the-ucsc-genome-browser-coordinate-counting-systems/

@Drmirdeep
Copy link
Author

No worries, everybody is quite busy. I may have some time over chrismas to make a unit test for it. Anyhow the tool demands a gff as input (1-based and fully closed) but then uses bed file logic internally (0-based and half open). Nonetheless, the interval sizes are somehow wrongly calculated in any case. With my little corrections it worked as expected for this part of the code. This I checked quite extensively already by running the internal bins which report back the coverage and the offset there is pretty obvious. I didn't take a closer look at the rest of the source which would need to much time now.

@jdimatteo
Copy link
Member

Can someone please assign themselves to review this? I'm sorry I can't for the foreseeable future. I can't just merge this because there are concerns that changing the bin size will break other people's workflow suddenly resulting in an unexpected change in behavior.

@Drmirdeep if you'd like to add an argument so that by default the bin size behavior remains unchanged, I'd feel more comfortable approving and merging this in.

@vsoch
Copy link
Contributor

vsoch commented Mar 18, 2020

@Drmirdeep you can also rebase with master to get fixes for testing, along with your change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants