You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Just wanted to report a behavior which I assume is intentional but might be misleading (or at least it was for me).
When using Cooler.pixels().fetch("<some_genomic_region>"), an instance of the class RangeSelector1D is returned. This selector returns all the pixels where the first bin id belongs to the provided interval. As an example, using the yeast.10kb.cool file in the tests/data folder:
print(handle.extent("chrII")) # results in (33, 115)
The issue (in my opinion) is that it does not return all pixels with at least one bin in the region. There can be pixels where the second bin belongs to the region and yet they are not returned.
print(handle.pixels()[:].query("bin2_id >= 33 and bin2_id < 115 and bin1_id < 33"))
I understand why this might be the case. There is no easy way of indexing the bin2_id column, therefore finding these pixels would require iterating and filtering all pixels prior to the retrieved slice. This would clash with the lazy fetching and most likely be slow.
Understanding why this behavior might not change, I believe it should be clearly stated in the documentation (I did not find anything about it but I might have missed it), since one could get misleading results. I would be happy to open a PR to either change or document the behavior if needed.
The text was updated successfully, but these errors were encountered:
Just wanted to report a behavior which I assume is intentional but might be misleading (or at least it was for me).
When using
Cooler.pixels().fetch("<some_genomic_region>")
, an instance of the classRangeSelector1D
is returned. This selector returns all the pixels where the first bin id belongs to the provided interval. As an example, using theyeast.10kb.cool
file in thetests/data
folder:returns:
which checks out since:
The issue (in my opinion) is that it does not return all pixels with at least one bin in the region. There can be pixels where the second bin belongs to the region and yet they are not returned.
returns:
I understand why this might be the case. There is no easy way of indexing the
bin2_id
column, therefore finding these pixels would require iterating and filtering all pixels prior to the retrieved slice. This would clash with the lazy fetching and most likely be slow.Understanding why this behavior might not change, I believe it should be clearly stated in the documentation (I did not find anything about it but I might have missed it), since one could get misleading results. I would be happy to open a PR to either change or document the behavior if needed.
The text was updated successfully, but these errors were encountered: