Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scatterplot: allow continuous overlay variables (needs to optionally support gradient colormaps) #1399

Closed
d-callan opened this issue Oct 5, 2022 · 30 comments · Fixed by #1455 or #1484
Assignees
Labels

Comments

@d-callan
Copy link
Contributor

d-callan commented Oct 5, 2022

mbio beta diversity app will need a scatterplot w a gradient colormap. but i dont think (@danicahelb could confirm) that clinepi wants gradient colormaps? so our scatter viz needs to be able to support them, but also have them be optional.

my (possibly flawed) memory from ann is that the components necessary to build a gradient colormap are available in web-components, but i havent yet confirmed.

@danicahelb
Copy link

clinepi would be happy to use gradient colormaps to allow continuous vars to be used as the overlay on scatter plots

@dmfalke dmfalke changed the title Scatterplot: needs to optionally support gradient colormaps Scatterplot: allow continuous overlay variables (needs to optionally support gradient colormaps) Oct 14, 2022
@danicahelb
Copy link

prioritize for immediately following b60 release since mbio needs this for beta div

@bobular
Copy link
Member

bobular commented Oct 27, 2022

@d-callan - are we envisaging always using binning for continuous overlay vars?

@d-callan
Copy link
Contributor Author

nope. its a true gradient

@d-callan
Copy link
Contributor Author

image

an ex of the goal, though w our own color palette(s). id imagine to start w have a diverging one like in the ex above that is applied if the metadata indicates the range crosses 0, and one thats not. it wont be perfect but gets us a decent start.

@d-callan
Copy link
Contributor Author

also see #306 for prev thinking

@d-callan
Copy link
Contributor Author

id also be fine to leave numbers w <9 values using a categorical palette for now if we dont want to tack on discretizing one of our continuous palettes dynamically yet. can make that a separate issue.

@bobular
Copy link
Member

bobular commented Oct 27, 2022

Thanks for the extra info. I see Ann did a lot, but it would likely be a challenge to merge the client work now. Good for reference though.

Is the back end (plot.data) ready to go for continuous (unbinned) overlay, but the constraints for the passthrough scatterplot app do not allow it? Connor will be looking into this. I guess while testing he can spoof or ignore the constraints temporarily?

@d-callan
Copy link
Contributor Author

Yea plot.data is ready, and I can get constraints in the data service on a branch somewhere without too much of a headache I think. @chowington when do you need that by?

@chowington
Copy link
Member

I haven't gotten anything spun up yet---still soaking up previous work, so no rush!

@chowington chowington self-assigned this Nov 1, 2022
@chowington
Copy link
Member

@d-callan I might be ready to start testing this by tomorrow, FYI!

@d-callan
Copy link
Contributor Author

d-callan commented Nov 1, 2022

@danicahelb
Copy link

This does not work for ClinEpi

I cannot select a continuous overlay variable (for example "Household wealth index, numerical")

Image

When I choose a categorical overlay variable with numeric values I (incorrectly) get the gradient legend but the colors don't match what is being plotted

Image

Also, may be related to #1481 & #306

@danicahelb
Copy link

danicahelb commented Dec 2, 2022

The bug has been fixed. I can now:
(1) select continuous overlays & get gradient color map
(2) select categorical overlays with numeric values & get discrete color map

I potentially found a new bug, though maybe this is just how it is implemented? @chowington can you take a look?

In GEMS1, BMI-for-age z-score ranges from -15 to +273, but when i use this term as an overlay, the legend indicates it ranges from -30 to +30. (I thought it could have something to do with not having data for the participant whose BMI-for-age z-score was 273 but checked and this is not the case)

Image

same thing with MUAC-for-age z-score which ranges from -7 to +15... legend shows range from -30 to +30

Image

perhaps this is something to do with the manual range annotations for GEMS? I don't see it happening in other studies. Axis range for most z-scores on the subsetting tab is -30 to +20, and it does look like diverging color maps are being centered on 0, in which case -30 to 30 would make sense. but BMI-for-age z-score axis range on the subsetting tab is -30 to 275, so i don't know why the legend for this one is -30 to +30

@danicahelb
Copy link

danicahelb commented Dec 2, 2022

The implementation details need to be discussed in an upcoming UX or dataViz meeting.

1. as colormap is a true gradient, outliers are highlighted and everything else is washed out.

  • for most overlay variables, this approach is good for exploring outliers but not useful for seeing differences in the distribution of the majority of the data.
  • Consider dividing the overlay range into deciles of equal frequency.

WASHb BNG:
Image

WASHb BNG:
Image

GEMS1:
Image

@d-callan
Copy link
Contributor Author

d-callan commented Dec 2, 2022

I can add ironing out details for this to a dataviz agenda. I'd also like to discuss discretizing the gradient for low cardinality numeric variables and ordinal variables.

@danicahelb
Copy link

  1. discretized color maps are currently always centered on 0... is this what we always want?

@danicahelb danicahelb added bug Something isn't working dataViz topic labels Dec 2, 2022
@bobular
Copy link
Member

bobular commented Dec 2, 2022

I potentially found a new bug, though maybe this is just how it is implemented? @chowington can you take a look?

In GEMS1, BMI-for-age z-score ranges from -15 to +273, but when i use this term as an overlay, the legend indicates it ranges from -30 to +30. (I thought it could have something to do with not having data for the participant whose BMI-for-age z-score was 273 but checked and this is not the case)

I've had a quick look. The code is creating a symmetrical gradient around zero and is using the curated displayRangeMin and displayRangeMax in preference to the data-derived rangeMin and rangeMax. (This preference has been standard practice until now, I believe.) So this is why we get -30 to 30 (because it creates a symmetric range using max(abs())

image

You might be asking "Why aren't the outlier values showing up as dark brown points?" - well they are there but they are overplotted with other less exciting points (the x and y variables are quantised to 1 decimal place I think). Here I've done some heavy subsetting on high-BMI-z-score values and they do show up nice and brown

image

Note also that there is 0.7 opacity applied to all points (but this may change when @moontrip's opacity work is done) so everything looks a bit washed out anyway.

@bobular
Copy link
Member

bobular commented Dec 2, 2022

Also, when there are very few points remaining after very heavy subsetting (e.g. subset on BMI-for-age z-score: 14 to 276) we get the categorical colour scheme. Ah, I see this is what @d-callan has already commented on here: #1399 (comment)

@bobular
Copy link
Member

bobular commented Dec 2, 2022

My final comment, and this should probably be a new ticket, is that we probably want the mouse-over information to include the "z" axis value for continuous overlays?

@bobular
Copy link
Member

bobular commented Dec 2, 2022

  1. discretized color maps are currently always centered on 0... is this what we always want?

Not sure what you mean here.

@chowington
Copy link
Member

Thanks @bobular for pointing out the reason for the z-score discrepancy. So should we use the range of the actual data instead of the curated range?

Also, a couple of UX issues I just noticed:

  1. The symmetric gradient around 0 has a practically white color for 0. Points with this color wouldn't show up, so I think we should update the gradient to have no super light colors.
  2. If the markers have some amount of transparency, the gradient legend should also have that amount of transparency so that the colors actually match up.

@bobular
Copy link
Member

bobular commented Dec 2, 2022

I'm not sure. As usual there are three choices:
a) actual range of data on screen (which we may not yet be calculating on the client, but can, of course)
b) full range of data in dataset (rangeMax/Min)
c) annotated range (displayRangeMax/Min)

using c) helps a bit towards the problem Danica reported that "everything is grey" - b) would exacerbate it. Switching to decile/normalised colours would need to go through a dataviz and/or UX meeting I think.

  1. Agree
  2. Agree (and I had thought of that too, but didn't write it down)

@chowington
Copy link
Member

Do we need to discuss this in a meeting, or does someone have a strong opinion?

@d-callan
Copy link
Contributor Author

d-callan commented Dec 6, 2022

I'm planning on discussing these colormaps in an upcoming dataviz meeting. I'd be fine to leave it as currently implemented until then.

@asizemore
Copy link
Member

Is anyone already working on @chowington 's point about the light colors in the gradient? If not that'd be something I can take on this week!

@bobular
Copy link
Member

bobular commented Dec 19, 2022

Not that I know of @asizemore !

@danicahelb
Copy link

@asizemore does a ticket for the light gradient colors exist? if not, can you make one?

@danicahelb danicahelb reopened this Jan 14, 2023
@asizemore
Copy link
Member

I would think this ticket is done? Scatterplot now supports gradient colormaps/continuous overlay vars.

There's still ongoing discussion about which colormaps to use and how to improve them, but ive made that into a separate ticket VEuPathDB/web-components#442

@danicahelb
Copy link

danicahelb commented Jan 27, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment