Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

legend updates needed for divergent colormaps for continuous overlay variables #454

Closed
danicahelb opened this issue Jan 27, 2023 · 20 comments · Fixed by VEuPathDB/web-monorepo#42 · May be fixed by VEuPathDB/web-eda#1696
Closed
Assignees

Comments

@danicahelb
Copy link

danicahelb commented Jan 27, 2023

From @bobular

In GEMS1, BMI-for-age z-score ranges from -15 to +273, but when i use this term as an overlay, the legend indicates it ranges from -30 to +30. (I thought it could have something to do with not having data for the participant whose BMI-for-age z-score was 273 but checked and this is not the case)

I've had a quick look. The code is creating a symmetrical gradient around zero and is using the curated displayRangeMin and displayRangeMax in preference to the data-derived rangeMin and rangeMax. (This preference has been standard practice until now, I believe.) So this is why we get -30 to 30 (because it creates a symmetric range using max(abs())

image

image

image

@danicahelb
Copy link
Author

danicahelb commented Jan 27, 2023

In 1/25/2023 data viz group, we discussed how to set the default range for gradient overlays and how does it map to hues in the gradient

decision is that the gradient colormap should be consistent w what we do for the axes:

“Axes representing user-selected continuous variables will scale according to study metadata (meaning use the annotated display range from the owl file when available, expand to the full data set range if we see any values in the response which exceed that)”

Unlike what we do for the axes, if the resulting range does not include 0 we should NOT expand the range to do so. currently we do force the gradient map to run through 0 even when the data doesn't and this makes it hard to distinguish the colors. see this example from GEMS1:
Image

@danicahelb
Copy link
Author

also, in the 1/25/2023 dataviz meeting, we decided that -- for diverging colormaps -- when the range isn't symmetrical (ie, -5 to 25 instead of -25 to 25), we want to cut the gradient off on the early side instead of forcing the hues to be mapped to the absolute value of the larger

@danicahelb danicahelb changed the title outlier values for continuous overlay variables are not represented in the legend legend updates needed for divergent colormaps for continuous overlay variables Jan 27, 2023
@dmfalke dmfalke transferred this issue from VEuPathDB/EdaNewIssues Feb 16, 2023
@moontrip moontrip self-assigned this Mar 14, 2023
@moontrip
Copy link
Contributor

moontrip commented Mar 16, 2023

Hi @danicahelb May I ask two questions to clarify?

First, for the case of Bob's screenshot above, displayMinMax = (-30, 30) and rangeMinMax = (-15.3, 273.91), then should the new range be (-30, 273.91)? Second, do we need to consider subsetted (filtered) data? For example, assuming that the range of the displayMinMax and rangeMinMax are the same with the first question and somehow subsetted data has MinMax = (10, 20), then do we need to respect the combination of the displayMinMax and rangeMinMax (e.g., -30, 273.91) or just use data-based MinMax (10, 20) for the colormap range?

In 1/25/2023 data viz group, we discussed how to set the default range for gradient overlays and how does it map to hues in the gradient

decision is that the gradient colormap should be consistent w what we do for the axes:

“Axes representing user-selected continuous variables will scale according to study metadata (meaning use the annotated display range from the owl file when available, expand to the full data set range if we see any values in the response which exceed that)”

Unlike what we do for the axes, if the resulting range does not include 0 we should NOT expand the range to do so. currently we do force the gradient map to run through 0 even when the data doesn't and this makes it hard to distinguish the colors.

@bobular
Copy link
Member

bobular commented Mar 20, 2023

This seems complicated. I can't see a clear feature request in my original message.

Can we clarify what we mean here:

for diverging colormaps -- when the range isn't symmetrical (ie, -5 to 25 instead of -25 to 25), we want to cut the gradient off on the early side

So let's say a regular -25 to +25 color map goes from dark blue to dark red.

Does the legend for -5 to +25 go from very light blue to dark red?

@bobular
Copy link
Member

bobular commented Mar 20, 2023

DK's question about filtered ranges might depend on whether the overlay variable was directly or indirectly filtered - aka "filter-aware" behaviour?

@moontrip
Copy link
Contributor

@bobular First of all, thank you for looking into this ticket 👍 Concerning your question, all legends (both normal and colormap) utilize pre-defined color scale from the first to the end. Thus, if the range will be -5 to +25, I think it should be from dark blue to dark red which would be more consistent throughout the style of colormap: that is, a minimum is dark blue and a maximum is dark red.

@moontrip
Copy link
Contributor

DK's question about filtered ranges might depend on whether the overlay variable was directly or indirectly filtered - aka "filter-aware" behaviour?

@bobular Overlay variable contains metadata for rangeMin/Max and/or displayMin/Max. I have checked those parameters after subsetting/filtering, but they did not change. Actually this is the reason why we computed data-based min/max for Scatterplot, Lineplot, etc. (for axis range controls)

Can you please confirm if you intended to consider filtered range as well? I think it should be like other Vizs (Scatterplot etc.) but I wanted to make sure of it before going for it because while it would be doable, it will take some time. I have already made a change to handle displayMin/Max and rangeMin/Max but data-based Min/Max in this case is somewhat tricky because we need to firstly compute data-based Min/Max (including facet case) and then change data based on the Min/Max (marker colors, etc.): on the other hand, for an example of Scatterplot Viz, we just used data-based Min/Max for setting axis range control, not changing data itself.

@bobular
Copy link
Member

bobular commented Mar 22, 2023

@bobular First of all, thank you for looking into this ticket +1 Concerning your question, all legends (both normal and colormap) utilize pre-defined color scale from the first to the end. Thus, if the range will be -5 to +25, I think it should be from dark blue to dark red which would be more consistent throughout the style of colormap: that is, a minimum is dark blue and a maximum is dark red.

Oh, I think it needs to use the middle (white-ish) color for zero. @danicahelb please could you confirm this and my question above.

@bobular
Copy link
Member

bobular commented Mar 22, 2023

DK's question about filtered ranges might depend on whether the overlay variable was directly or indirectly filtered - aka "filter-aware" behaviour?

@bobular Overlay variable contains metadata for rangeMin/Max and/or displayMin/Max. I have checked those parameters after subsetting/filtering, but they did not change. Actually this is the reason why we computed data-based min/max for Scatterplot, Lineplot, etc. (for axis range controls)

Can you please confirm if you intended to consider filtered range as well? I think it should be like other Vizs (Scatterplot etc.) but I wanted to make sure of it before going for it because while it would be doable, it will take some time. I have already made a change to handle displayMin/Max and rangeMin/Max but data-based Min/Max in this case is somewhat tricky because we need to firstly compute data-based Min/Max (including facet case) and then change data based on the Min/Max (marker colors, etc.): on the other hand, for an example of Scatterplot Viz, we just used data-based Min/Max for setting axis range control, not changing data itself.

@moontrip you are correct the variable metadata doesn't change when we filter on that variable. I am jumping the gun on this issue. I think we need new tickets for filter-aware axis range and overlay colormap behaviour. I was just asking Danielle about it in #frontend-dev

@moontrip
Copy link
Contributor

@bobular yes I am aware of the filter-aware metadata work by Danielle, as I left a comment in my PR, VEuPathDB/web-eda#1696 (comment): but I think it is still underway?

That color mapping stuff is interesting. I didn't know we should stick to zero=white for a divergent colormap. I will see the decision and act 😃

@bobular
Copy link
Member

bobular commented Mar 22, 2023

@bobular yes I am aware of the filter-aware metadata work by Danielle, as I left a comment in my PR, VEuPathDB/web-eda#1696 (comment): but I think it is still underway?

That color mapping stuff is interesting. I didn't know we should stick to zero=white for a divergent colormap. I will see the decision and act smiley

Oh yes, sorry I didn't see (or did see, then forgot) that you were already on top of the filter-responsive gradient handling.

The important point about filter-aware handling is that if the user has filtered directly on the variable that is used in the overlay/gradient then we use the filter's min/max (not the back end response-derived min/max).

For axis ranges (part of a future ticket) the "Full" range will use annotated range and the "Auto zoom" will use the filter's min/max, again, if that variable has been directly filtered on. Sorry that was wrong. Let's wait for the new ticket(s)

There's no back end work to wait on here, so we could implement the filter-aware gradient handling right now, if you like.

@bobular
Copy link
Member

bobular commented Mar 22, 2023

The filter-aware overlay gradient handling can also wait for a new ticket if you like.

@moontrip
Copy link
Contributor

moontrip commented Mar 22, 2023

@bobular yes I am aware of the filter-aware metadata work by Danielle, as I left a comment in my PR, VEuPathDB/web-eda#1696 (comment): but I think it is still underway?
That color mapping stuff is interesting. I didn't know we should stick to zero=white for a divergent colormap. I will see the decision and act smiley

Oh yes, sorry I didn't see (or did see, then forgot) that you were already on top of the filter-responsive gradient handling.

The important point about filter-aware handling is that if the user has filtered directly on the variable that is used in the overlay/gradient then we use the filter's min/max (not the back end response-derived min/max).

For axis ranges (part of a future ticket) the "Full" range will use annotated range and the "Auto zoom" will use the filter's min/max, again, if that variable has been directly filtered on.

There's no back end work to wait on here, so we could implement the filter-aware gradient handling right now, if you like.

@bobular Thank you for your detailed information but I am afraid I am missing something or confused. If I understand correctly, you are mixing the term "filter-aware" in your explanation? There are two things concerning the term "filter-aware": a) filter-aware metadata/annotation (VEuPathDB/EdaDataService#240) - still draft PR; b) "filter-aware gradient handling" at the end of your sentence is the same with the "filter-responsive gradient handling" (computing data/response-based Min/Max) at the top sentence?
Current implementation at the PR is definitely a filter-responsive gradient handling as data-based Min/Max is computed, but if filter-aware metadata is available, then we can certainly use it instead of computing Min/Max from data/response.

@bobular
Copy link
Member

bobular commented Mar 22, 2023

Hi @moontrip

Sorry for the delay getting back to you. I had meetings and then had to think about this carefully.

Let's leave filter-awareness to the new tickets which I'll make soon. I have said some confusing things in this ticket, sorry about that.

Let's rewind back to the initial specs that @danicahelb provided above: #454 (comment)

decision is that the gradient colormap should be consistent w what we do for the axes:
“Axes representing user-selected continuous variables will scale according to study metadata (meaning use the annotated display range from the owl file when available, expand to the full data set range if we see any values in the response which exceed that)”
Unlike what we do for the axes, if the resulting range does not include 0 we should NOT expand the range to do so.

What this means is that we should not set the range based on the min/max of the response data, as is currently implemented in this PR. e.g. here

I think what we need is to add a new option to useDefaultAxisRange (and numberDateDefaultAxisRange) to NOT use the zero in the Math.min(...). Furthermore this function is becoming unreadable. One option might be to use lodash.min() because this seems to ignore null values, so many of the ? :s could be avoided (just keep the logscale-related ones).

Then we can use useDefaultAxisRange for overlay ranges also.

Then we can come back to the zero-straddling ranges, and what to do about the colors after @danicahelb has clarified the requirements I asked about here.

@moontrip
Copy link
Contributor

@bobular thank you for your detailed information and suggestions. It is very helpful for me to figure out what we want to implement 👍 Below are my thoughts.

First, data-based min/max is only used when filters is not empty: i.e., subsetted. It is because without filters, rangeMin/Max from metadata are essentially the same with data-based min/max. As I said in my PR, if there is no filter, I just compared displayMin/Max and rangeMin/Max.

Second, from my understanding, lodash.min/max throws error like Math.min/max or return undefined if an undefined or null (variable/value) in the function arguments is used. e.g., _.min(1,2,undefined/null) returns undefined.

Third, it is not clear to me if you want to send 'undefined' to useDefaultAxisRange() for observed Min/Max (data-based Min/Max) under current situation (without filter-aware metadata)? Certainly we can send observed Min/Max to the hook and forcibly set axisRangeSpec: 'Full' as an argument of the hook: just FYI, that is basically the same to what I have currently implemented in the PR for non-filter case.

So basically, a question is how we handle filter/subset case for the range.

@bobular
Copy link
Member

bobular commented Mar 23, 2023

Hi @moontrip
The answer is that we do nothing in this PR with respect to filtered (directly or indirectly) vs non-filtered data.

@danicahelb didn't answer your question below, but I clarified yesterday with Dave/Danielle that the filter-aware behaviour is as described in this new ticket. Overlay variable color gradient ranges are never scaled to the min/max of the data response (i.e. similar to "auto zoom" mode for axis ranges)

Second, do we need to consider subsetted (filtered) data? For example, assuming that the range of the displayMinMax and rangeMinMax are the same with the first question and somehow subsetted data has MinMax = (10, 20), then do we need to respect the combination of the displayMinMax and rangeMinMax (e.g., -30, 273.91) or just use data-based MinMax (10, 20) for the colormap range?

The task for this ticket is to remove the "start at zero" for the overlay "axis".

So useDefaultAxisRange can handle this nicely I think.

lodash does return undefined for _.min(1,2,undefined,null) (I didn't know that)
but it returns 1 for _.min([1,2,undefined,null]) - sorry I wasn't clearer about this yesterday.

For your third question, I don't think it matters if we pass observed min/max or undefined into useDefaultAxisRange() or not. The subsetted range of a variable should always be smaller than or equal to the automatically annotated dataset rangeMin/Max. I think the only reason observedMin/Max are arguments to the function is because the observedMin is important for logscale handling. I think also the observed min/max are a safety measure. Just in case the automatically annotated min/max are incorrect. If in doubt, I would send these values, if we have them easily to hand.

@moontrip
Copy link
Contributor

@bobular Thank you for your detailed explanations 👍 I didn't consider the case the array form with the lodash min/max. Will test it and change the hook and associated function: hope TS does not complain 😄

Will probably rework this PR early next week as I am currently engaged with other ticket(s).

@moontrip
Copy link
Contributor

moontrip commented Mar 23, 2023

@bobular Out of curiosity, I just did a quick test for lodash min (or max) with the array form in the jsfiddle. But unfortunately, the array form still throws error if one of elements comprised of variables is not defined (i.e., non-existent). For example, a=1 & c=2, then _.min([a, b, c]) throws error because b is not defined. This will be the case if, e.g., displayMin does not exist in the data while data.value.rangeMin exists, then _.min([data.value.displayMin, data.value.rangeMin]) will still throw error.

perhaps this can be avoided by checking the presence of displayMin/Max & rangeMin/Max by assigning them as new local variables, and then set them as undefined value if non-existent.

@bobular
Copy link
Member

bobular commented Mar 24, 2023

Hi @moontrip - I tested in node interactive "shell".
There's a difference between "not defined" and undefined/null.

> _.min([-11,z,0,2,5,-1])
Uncaught ReferenceError: z is not defined

But this is OK

> z = null
null
> _.min([-11,z,0,2,5,-1])
-11

I'm pretty sure we'll be passing a variable to _.min() that is defined, but could contain undefined or null as values.

@moontrip
Copy link
Contributor

@bobular Thank you for your test. Yes you are right. There are two different cases: a) JS variable is not defined; b) JS variable is defined, but its value is not defined (undefined) or null. So even const z; (without defining a value) or z = undefined is the b) case and this will work for lodash _.min([]). But what I was concerned is the a) case. From my experience, either displayMin/Max or rangeMin/Max are not always defined at a variable's metadata: using ? behind a variable name as a typing (e.g., prevents the a) case, which you didn't like it at the hook due to its many usage 😄. Actually, there is a way to check a) case in a JS manner, like typeof z === "undefined", so we can check it before sending it to _.min([]) in order to avoid a potential error: this is what I intended in my previous speaking. But I will check the case you mentioned whether a variable is already defined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants