Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log transform in view command causes division by 0 #67

Open
etiennejourdier opened this issue Oct 6, 2022 · 5 comments
Open

log transform in view command causes division by 0 #67

etiennejourdier opened this issue Oct 6, 2022 · 5 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@etiennejourdier
Copy link

Hi again,
In the view command, the option -- transform with values ln, log2 or log10 causes this error :
WARNING :: /usr/local/lib/python3.7/dist-packages/hicstuff/commands.py:467: RuntimeWarning: divide by zero encountered in log
I imagine this is due to the pairs whithout any contact, which are treated as zeros.

By the way for these pairs with no contact :

  • with the sqrt transform, the pixel take the color of the value 0, which is logical
  • whitout the transform option, the pixel is white whatever the colormap, which can be confusing, but is also interesting to differenciate the 2 cases : very few contacts VS no contacts at all.
    Maybe an option "--color-blank-pixels" could be interesting ?
@ABignaud
Copy link
Member

ABignaud commented Oct 6, 2022

Hi again,
For the first point, it's just a warning about the log(0), which return a nan. You should still have a normal output.
For the second point, I think it's due to the fact that the matrices as stored as sparse matrix (we don't save position with zeros) to save some memory usage. When we plot them the zero point are transform into nan and thus can be plot in a different color than the 0.
I know that depending on the output format, the resolution of the matrix and the numbers of dpi matplotlib savefig doesn't have the same comportment. Could you put the command line that you use ?

@ABignaud ABignaud added enhancement New feature or request question Further information is requested labels Oct 6, 2022
@etiennejourdier
Copy link
Author

For the first point, it's just a warning about the log(0), which return a nan. You should still have a normal output.

Indeed I had not noticed that the image is actually generated, sorry about that.

Could you put the command line that you use ?

hicstuff view --lines --transform=... --cmap=winter --output=out/xx.png --frags=out/xx.frags.tsv out/xx.mat.tsv
(here with a colormap without white to clearly see the difference between zeros and nan) :

  • without the transform option, the zeros (=no contact) are not colored :
    Rut2-no

  • but with the sqrt transform, the same zeros (=no contact) become colored :
    Rut2-sqrt

  • and the same for the exp0.2 transform :
    Rut2-pow

  • but with ln transform, the no contact are not colored . But by the way, the zero in the color scale means 0 contact or 1 contact ? = Is it a ln(1+nb) or a ln(nb) transformation ?
    Rut2-ln

I am a beginner with hiscstuff so I don't know if this remark is relevant. But I find it interesting that you can color the no-contact differently from the weak contacts to better see the weak contacts.

@ABignaud
Copy link
Member

ABignaud commented Oct 7, 2022

Hi,
You should normalize the matrix usin the --normalize if you want to transform it, it would be better. Furthermore, here you don't use your colorscale at all, everything is either blue or white.
For the 0 and nan we usually use colormap where the actual 0 is white so we did not see the difference between 0 and nan and it's easier to explain that the white is low frequency of contacts, and the color high frequency of contacts. As an example you cans use Reds (or any other colors) or afmhot_r. It should help you avoid this problem.
Usually we don't care between a close to zero or a non contact as it's kind of the same. So we don't really want to color them differently except for the white lines which are a mapping issue but they are easily visibile are even the main diagonal is empty.
For the ln, it's a ln(x) transformation, the zeros are still white as ln(0) = -np.inf en python and thus a nan for matplotlib. So it will be color in white by matplotlib.
If you want some example on how to use hicstuff, there are some demo here https://github.com/koszullab/hicstuff/blob/master/doc/notebooks/

@etiennejourdier
Copy link
Author

OK, thanks for the explanation. I will normalize and use afmhot-r colormap.

But I still have a question about this :

except for the white lines which are a mapping issue but they are easily visibile are even the main diagonal is empty

I noticed that the normalization procedure actually creates white lines since all the bins not included in the normalization procedure (outside the 3 MAD) become empty. In the example below, the diagonal of the centromere disappeared, and 2 blank lines appeared before and after (as a novice, I find it a strange behavior !). So if I directly normalize, I can't differentiate between low coverage bins and mapping issue, right ?

note : maybe the SRA I used for these tests is of poor quality ? with too many uncut events ?

  • before normalisation hicstuff view --transform=sqrt --max=95%
    Rut2

  • after normalisation hicstuff view --normalize --transform=sqrt --max=95%
    Rut2-norm

@ABignaud
Copy link
Member

Hi,

So if I directly normalize, I can't differentiate between low coverage bins and mapping issue, right ?

Low coverage bins are usually the ones with mapping issues. So if I understand your question, you want to differentiate the one with low coverage from the ones with no coverage at all. In order to keep the one with low coverage you can increase the MAD to 10 for example and they will be included to the normalization.

maybe the SRA I used for these tests is of poor quality ?

I have seen better contact map but that's not so bad, what's the binning size ?

with too many uncut events ?

Did you used the filter option when you launched hicstuff ? It gave you the ratio of uncut events. For non mammalian or yeast where the protocol have been optimized, it's hard to have less than 50% of uncuts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants