Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inf test fails #172

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Inf test fails #172

wants to merge 6 commits into from

Conversation

tdhock
Copy link
Owner

@tdhock tdhock commented Nov 3, 2016

This branch implements a test that fails for infinite data. This used to be working when we saved the built data to tsv files (before axis updates). See screenshot below.

  • Left: the current animint code which renders segments incorrectly (outside of their panels).
  • Right: rendering in In ggplot2, infinite data are rendered on the edge of the panel.

screenshot-inf-full

Note also that the DOM panel shows that the JavaScript code is not able to tell the difference between positive and negative infinity (they are both read as NaN).

To fix this I propose to modify the compiler to replace Inf/-Inf values with the appropriate finite values that correspond to the panel border positions (scale limits). These scale limits will be different for different showSelected subsets, if axis_updates is on. @faizan-khan-iit do you have any ideas about where to change the compiler in order to achieve this?

TODO add a test for infinite values in a plot with axis updates.

@faizan-khan-iit
Copy link
Collaborator

@tdhock That sounds reasonable. But it might be a bit tricky.

We will have to edit the built data in the ggplot.list since that is used to compute the domains. https://github.com/tdhock/animint/blob/master/R/animint.R#L1816-L1817

We check for domains after saving the layers, which might require some attention. I could look into it further, if you would like me to.

Quick question: Was this handled by ggplot2:::coord_transform() when we were using ggplot2 v1.0.1? https://github.com/tdhock/animint/blob/classic/R/animint.R#L711-L713

If yes, we could save some time and edit the data at the same place. This should work for non-update cases at least...

@tdhock
Copy link
Owner Author

tdhock commented Nov 4, 2016

for this example we have

Browse[2]> meta$built$data
[[1]]
  x xend    y yend PANEL group colour size linetype alpha
1 1    1 -Inf    1     1    -1  black  0.5        1    NA
2 2    2    0    2     1    -1  black  0.5        1    NA
3 3    3   -1  Inf     1    -1  black  0.5        1    NA

[[2]]
     x xend y yend PANEL group colour size linetype alpha
1 -Inf    1 1    1     4    -1  black  0.5        1    NA
2    0    2 2    2     4    -1  black  0.5        1    NA
3   -1  Inf 3    3     4    -1  black  0.5        1    NA

Browse[2]> 

@tdhock
Copy link
Owner Author

tdhock commented Nov 4, 2016

however in this example, compute_domains is not called at all (I guess since there are not showSelected variables)

in this case, how are the domains computed?

@tdhock
Copy link
Owner Author

tdhock commented Nov 4, 2016

I would think that after computing domains, and before saving the data to tsv files, we should replace the infinite values with the corresponding finite value that is at the edge of the domain

@tdhock
Copy link
Owner Author

tdhock commented Nov 4, 2016

and yes I think this was handled by coord_transform in old ggplot2

however I don't think we should go back to using that, since it does not work with update_axes.

now I think we should be systematically using your compute_domains

@faizan-khan-iit
Copy link
Collaborator

compute_domains is only called when update_axes is used in theme_animint. (see here)
So any plots not having axis updates will just ignore this, as we don't need to compute domains and save them to our plot.json file.

So when we have no axis-updates, as in this example, we could just replace the values accordingly before saving the layer. So I suggested that we could do it at the same place the coord_transform function used to be, just after where switch_axes function is now

But this creates a problem for when we do have axis-updates, since we check for that after the layer data has been saved. This is problematic since the Inf values will already have been replaced when we proceed to computing the subsets afterwards.

A straightforward solution could be to move the saveLayer call after the whole axis update code.

If we do proceed with this solution, the compute_domain function may require the addition of finite=TRUE calls wherever we calculate range/min-max. After we have computed the min/max for a particular subset, we will have to edit the data in ggplot.list, so that the value gets saved to the TSV.

This is surprisingly more complicated than I thought it would be...

@faizan-khan-iit
Copy link
Collaborator

@tdhock I tried implementing the above solution when there are no axes updates, and unfortunately it seems like it will get complicated to deal with the Inf values in the compiler. I need give this a bit more thought as a stright-forward quick fix doesn't seem to work..

@tdhock
Copy link
Owner Author

tdhock commented Nov 14, 2016

thanks for trying to implement that solution. what exactly was the problem that you encountered?

it seems to me that it should be possible to do the following: (1) use compute_domains to get scale limits for each showSelected subset, (2) replace Inf values with scale limits for each showSelected subset, (3) write those values to tsv files via saveLayer

@tdhock
Copy link
Owner Author

tdhock commented Nov 14, 2016

by the way there is a workaround: just replace the Inf values with finite values in the data set or aes definition. It is annoying and the values are not plotted on the edge of the panel, but it works ... for example here is an animint I presented last Friday http://bl.ocks.org/tdhock/raw/9311ca39d643d127e04a088814c81ee1/ where the second plot should have Inf values but I replaced them with finite values https://github.com/tdhock/PeakSegFPOP-paper/blob/d53e389bb86a5646d44c9fffbf72a0911409714f/figure-large-margin.R#L475

@faizan-khan-iit
Copy link
Collaborator

@tdhock In the case when compute_domains is not called, i.e. in the case when there are no axis updates, I tried to replace the Inf values with scale limits. I tried replacing them in the saveLayer function, but that got complicated as we only deal with plotting data in that function, and not the panel info. So I could not find a straightforward way to replace the values with scale limits.

it seems to me that it should be possible to do the following: (1) use compute_domains to get scale limits for each showSelected subset, (2) replace Inf values with scale limits for each showSelected subset, (3) write those values to tsv files via saveLayer

I think this is the way to go. I will try and see how this works out..

Regarding the replacing of values in the original dataset, I think it would be worthwhile to handle this in the animint code, as some users may find it bothersome to replace these values beforehand.

As an alternative, is it possible to handle this in the renderer?

@tdhock
Copy link
Owner Author

tdhock commented Nov 14, 2016

thanks for clarifying the problem. that makes sense that we can't do it entirely in saveLayer, since we do not have access to the scale limits (domains) there.

and yes I agree it is worthwhile to handle it in the animint code, as that is what ggplot2 does.

it is definitely not possible to handle this only in the renderer, since Inf values in the tsv files are read into JavaScript as NaN.

@faizan-khan-iit
Copy link
Collaborator

faizan-khan-iit commented Dec 27, 2016

@tdhock I have made some changes that I want to be reviewed:

  1. As I was working on this, I was going through parsePlot and saveLayer functions and it was a little confusing to understand the code, specially due to everything being passed through the meta environment. Then I saw this comment, and decided to clean up the functions a little, so that only the global values be stored in the environment. However I have made minimal changes, but I strongly suggest we should reduce global values as much as possible.
  2. In my previous comment I mentioned that I could not fix the bug in the saveLayer function, so I split it up into two functions, so that the values are stored after the Inf values are replaced. The computation is done in saveLayer itself but the values are saved in TSV files in storeLayer.
  3. Currently the Inf values get replaced only for plots where no axes updates are mentioned. If this approach gets the OK, I will implement it for the axes updates too. I have thought this over, but currently there is no easy method to handle the Inf values in a single place. We need to handle them separately for non-updating and updating plots.
  4. How should the different geoms react to Inf values? For eg: If a point has (Inf, Inf) coordinates in a scatter plot, should it be displayed at the top right corner of the plot (as in ggplot2), or not displayed at all? Currently the Inf values are just replaced with the most extreme values in the panel, regardless of the geom specified.

With these changes, the previous example seems to work well: https://bl.ocks.org/faizan-khan-iit/a58ad738149c016b485af3e697ae6102

@tdhock
Copy link
Owner Author

tdhock commented Jan 3, 2017

Hey Faizan thanks for all of these changes, especially for cleaning up the parsePlot / meta mess.

About point 3. can you please explain why it is not possible to handle the Inf values in the same way for updating and non-updating plots? It seems to me that non-updating plots should be a special case. (only 1 set of axis ranges instead of several)

@faizan-khan-iit
Copy link
Collaborator

Well the internals look a bit better now. I will try and clean up more afterwards.

Regarding the case of non-updating plots as a particular case of updating plots with only one axis range: When I implemented the axis updates, I edited the compiler so that the non-updating plots skip the domain computation (see this line), since that was redundant as the axis ranges had already been calculated before, and also because the renderer would have to display the same axis range every time. So when replacing the Inf values, we will need to edit the data at two places. For updating plots, we will need to add it somewhere around here, while for non-updating plots, we already have that here.

I guess what you would like is a single function that handles both the cases at the same place. If that is so, I will need to edit the current code that handles the axis updates.

What I was suggesting was that I could just add another function to handle Inf values for updating plots.

@tdhock
Copy link
Owner Author

tdhock commented Jan 4, 2017

Thanks for your very clear explanation Faizan. However I'm not sure to have understood the point about "the renderer would have to display the same axis range every time." It seems to me that if there is only one axis domain/range, then the renderer should just render that axis at the beginning, and changing the selection should not change the axis. Am I missing something?

So I guess we are confronted with a design choice: either

(1) compute axis domains for updating and non-updating plots differently. Advantage: it uses the range that has already computed when building the ggplot, so it avoids repetition. Disadvantage: we have to implement Inf handling (and possibly other things later) twice -- once for updating and once for non-updating plots.

(2) compute axis domains the same way for updating and non-updating plots, which means that we ignore the domain from the built ggplot, and instead always use your domain computation code. Advantage: we only need to implement Inf handling once. Disadvantage: it may be a little bit slower for non-updating plots.

Do you see any other advantages/disadvantages to add to these choices?

I think design choice (1) will be easier to maintain in the long run, since there is just once place in the code that we will know is responsible for domain computation.

@faizan-khan-iit
Copy link
Collaborator

I think your comment pretty much sums up the situation.

I was also thinking about the point you raised and I think that having the code for replacing Inf at two different places is a bit of bad programming practice, is it not? I would certainly like to avoid that, but that means that the domain computation code needs to be changed, which as you pointed out will be difficult to maintain later.

However I'm not sure to have understood the point about "the renderer would have to display the same axis range every time."

No need to worry regarding this. It won't happen. It was a bit of mistake on my part. I had forgotten that I had already integrated a check in the renderer regarding this.

Regarding the pros and cons of the choices, choice (2) will not be slower once the plot has been generated. Of course the compiler will take a bit more time. Also, there are concerns with maintenance in both cases. Any bugs in Inf value code will have to be corrected at two different places, which is a disadvantage for (1).

So finally, I think despite introducing a bit of redundant code, choice (1) is better, but only slightly.

@tdhock
Copy link
Owner Author

tdhock commented Jan 4, 2017

ok, thanks for the clarification.

but actually I made a mistake, I meant to say that choice (2) will be easier to maintain in the long run, since there is just once place in the code that we will know is responsible for domain computation for all plots.

haha.

does that change your opinion?

@tdhock
Copy link
Owner Author

tdhock commented Jan 4, 2017

and I agree with your comment that "having the code for replacing Inf at two different places is a bit of bad programming practice"

@faizan-khan-iit
Copy link
Collaborator

Ok then. I will alter the implementation of the axes updates a bit (choice 2). It should benefit us in the future and is a better programming practice.

As for point 4 regarding the geoms, do you have any comments, or should I just mirror the ggplot results?

@tdhock
Copy link
Owner Author

tdhock commented Jan 5, 2017

great. about point 4, I think "the Inf values are just replaced with the most extreme values in the panel" is almost the same thing that ggplot2 does. Rather than the most extreme finite data point value, I think it should be the most extreme domain value (which should be a bit more extreme than the most extreme finite data point value). Do you see any geoms for which this would not work?

@faizan-khan-iit
Copy link
Collaborator

To be fair, I have not worked with Inf values much, so I do not yet know how different geoms respond to it in ggplot2. I will cook up some examples and let you know...

@faizan-khan-iit
Copy link
Collaborator

I tried to create various cases for Inf values for different geoms, and I think ggplot2 always does the same: replace the values with the most extreme values in the Panel. So I will also replace the Inf values with the most extreme ones in the concerned subset, regardless of the geom.

@tdhock
Copy link
Owner Author

tdhock commented Jan 10, 2017

that sounds great. Thanks Faizan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants