-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Align circles in X/Y direction #5
Comments
My first reaction is to resist the temptation to support special cases such as the one that you outlined to ensure that the code base remains as simple as possible -- it makes maintenance easier and on-boarding of new contributors such as yourself possible. However, a write-up of your approach could make a great (advanced) example for how to tweak the layout in the documentation. Do you want to have a stab at that?
The placement of the subset labels is currently bugged / inaccurate. They should be placed at the point of inaccessibility, but clearly aren't in your (last) example. In your example, this should result in the subset labels being placed on the x-axis. I suspect this is a precision issue in shapely, but I haven't had/made the time to look into it further.
I do dislike how the set labels are currently placed. Basically, I draw a line from the center of mass of the whole diagram through the origin of each set and then place the label on that line just outside the set that is being labelled. While being a decent heuristic that yields OK results in 80-90% of the cases, this leaves a bit to be desired. I have made notes on two-and-a-half other layout ideas.
The last idea is not quite sufficient yet, though, as sets that aren't strict subsets but where the union of two or more sets does form a superset aren't handled by this approach (e.g. {a. b}, {b, c}, {c, d}). |
I get that. However, please bear in mind: This would only be one additional parameter to a Diagram class and two small additions to the class MyEulerDiagram(EulerDiagram):
def _optimize_layout(
self,
subset_sizes: Mapping[Tuple[bool], int | float],
origins: NDArray,
radii: NDArray,
objective: str,
verbose: bool,
) -> Tuple[NDArray, NDArray]:
"""Optimize the placement of circle origins according to the
given cost function objective.
"""
desired_areas = np.array(list(subset_sizes.values()))
def cost_function(flattened_origins):
origins = flattened_origins.reshape(-1, 2)
## NOTE: Add this line:
origins[:, 1] = 0
subset_areas = np.array(
[
geometry.area
for geometry in self._get_subset_geometries(
subset_sizes.keys(), origins, radii
).values()
]
)
if objective == "simple":
cost = subset_areas - desired_areas
elif objective == "squared":
cost = (subset_areas - desired_areas) ** 2
elif objective == "relative":
with warnings.catch_warnings():
warnings.filterwarnings(
"ignore", message="divide by zero encountered in scalar divide"
)
cost = [
1 - min(x / y, y / x) if x != y else 0.0
for x, y in zip(subset_areas, desired_areas)
]
elif objective == "logarithmic":
cost = np.log(subset_areas + 1) - np.log(desired_areas + 1)
elif objective == "inverse":
eps = 1e-2 * np.sum(desired_areas)
cost = 1 / (subset_areas + eps) - 1 / (desired_areas + eps)
else:
msg = f"The provided cost function objective is not implemented: {objective}."
msg += "\nAvailable objectives are: 'simple', 'squared', 'logarithmic', 'relative', and 'inverse'."
raise ValueError(msg)
return np.sum(np.abs(cost))
# constraints:
eps = np.min(radii) * 0.01
lower_bounds = np.abs(radii[np.newaxis, :] - radii[:, np.newaxis]) - eps
lower_bounds[lower_bounds < 0] = 0
lower_bounds = squareform(lower_bounds)
upper_bounds = radii[np.newaxis, :] + radii[:, np.newaxis] + eps
upper_bounds -= np.diag(
np.diag(upper_bounds)
) # squareform requires zeros on diagonal
upper_bounds = squareform(upper_bounds)
def constraint_function(flattened_origins):
origins = np.reshape(flattened_origins, (-1, 2))
return pdist(origins)
distance_between_origins = NonlinearConstraint(
constraint_function, lb=lower_bounds, ub=upper_bounds
)
result = minimize(
cost_function,
origins.flatten(),
method="SLSQP",
constraints=[distance_between_origins],
options=dict(disp=verbose, eps=eps),
)
if not result.success:
feedback = "Could not optimise layout for the given subsets. Try a different cost function objective."
warnings.warn(f"{result.message}. {feedback}")
origins = result.x.reshape((-1, 2))
## NOTE: Add this line
origins[:, 1] = 0
return origins, radii
Yes, I stumbled about that concept of POI in the code. Yes, you're right! In def _draw_subset_labels(
self,
subset_labels: Mapping[Tuple[bool], str],
subset_geometries: Mapping[Tuple[bool], ShapelyPolygon],
subset_colors: Mapping[Tuple[bool], NDArray],
ax: plt.Axes,
) -> dict[Tuple[bool], plt.Text]:
"""Place subset labels centred on the point of inaccesibility
(POI) of the corresponding polygon.
"""
subset_label_artists = dict()
tolerance = 0.0001
for subset, label in subset_labels.items():
geometry = subset_geometries[subset]
if geometry.area > 0:
if isinstance(geometry, ShapelyPolygon):
poi = polylabel(geometry, tolerance)
elif isinstance(geometry, ShapelyMultiPolygon):
# use largest sub-geometry
poi = polylabel(max(geometry.geoms, key=lambda x: x.area), tolerance)
else:
raise TypeError(
f"Shapely returned neither a Polygon or MultiPolygon but instead {type(geometry)} object!"
)
fontcolor = (
"black"
if rgba_to_grayscale(*subset_colors[subset]) > 0.5
else "white"
)
subset_label_artists[subset] = ax.text(
poi.x, poi.y, label, color=fontcolor, va="center", ha="center"
)
return subset_label_artists Here is my code to replicate this figure: subset_labels = {
# A*, A, P
(1, 1, 1): r"$P \wedge A$",
(1, 0, 1): r"$P \wedge A*$",
# (1, 1, 0): r"$A \wedge \neg P$",
# (1, 0, 0): r"$A \wedge \neg P$",
(0, 0, 1): r"$P \setminus A*$",
}
MyEulerDiagram(
{
# A*, A, P
(1, 1, 1): 1,
(1, 0, 1): 1,
(1, 1, 0): 1,
(1, 0, 0): 1,
(0, 0, 1): 1,
},
set_labels=["A*", "A", "P"],
subset_label_formatter=lambda subset, size: subset_labels.get(subset, ""),
ax=ax,
)
Hmm. So in my example, the |
Presumably,
I think we want to use the smallest number that doesn't cause a substantial increase in running time. I have run some preliminary tests using your example with different tolerance values (albeit without your changes to the optimization):
I think a 5% increase is negligible, a 13% increase is tolerable; a 55% increase seems too much to be a sensible default given that I can't see much improvement in any of my test cases beyond a tolerance of 0.01. However, to accommodate cases such as yours, we could expose the tolerance parameter as a global variable. Then you could set a lower value using the following syntax: import matplotlib_set_diagrams as msd
msd._diagram_classes.POLYLABEL_TOLERANCE = 1e-4
EulerDiagram(...) Have a look at the commit I linked above. |
Yeah, these are all valid points. I do like the idea of styling the subset and set labels differently. |
This makes it easier to distinguish them from one another.
Matploltib has a similar issue with axes labels and tick labels. They use font size as the distinguishing factor ("small" for tick labels; "large" for axis labels). I have copied their approach for the time being. Not perfect, but unintrusive: |
That's what I thought at first, too. I did an initialization where all circles are placed on a line. But it didn't make a difference in my case... Also, when just clamping y=0, it does not make a difference, if the origins where initially placed on a circle or on a line.
I like that! Or would it make sense to have However, if we already know that all the subset labels must land on the x axis (because all origins are on the x axis as well), a simpler algorithm could be used where the intersection of the subset geometry and the x axis is calculated and then the center of that line segment is selected. This would (drastically) increase speed in this special case and would not require to increase the precision in the general case.
That looks good in your examples. In my example, there is the problem that the label for For both colored set labels (suggested earlier) and legends, it is necessary to access the set colors. However, they are currently not stored in an attribute but only used in |
I haven't messed with the set label placement, yet. It obviously still needs work, despite the difference in styling to make set labels less similar to subset labels. |
I guess during the first iteration, the circles are moved onto the x-axis and the remain there. |
…#5) The default can thus be changed by subclassing `SetDiagram`.
I have made import matplotlib.pyplot as plt
from matplotlib_set_diagrams import EulerDiagram
class MyCustomEulerDiagram(EulerDiagram):
def _draw_subset_labels(
self, subset_labels, subset_geometries, subset_colors, ax, polylabel_tolerance=1.):
return super()._draw_subset_labels(
subset_labels, subset_geometries, subset_colors, ax, polylabel_tolerance)
subset_sizes = {
(1, 0, 0) : 1,
(1, 1, 0) : 1,
(1, 1, 1) : 1,
(0, 1, 0) : 1,
(0, 1, 1) : 1,
(0, 0, 1) : 0,
}
fig, (ax1, ax2) = plt.subplots(1, 2)
ax1.set_title("Strict polylabel tolerance")
EulerDiagram(subset_sizes, ax=ax1)
ax2.set_title("Relaxed polylabel tolerance")
MyCustomEulerDiagram(subset_sizes, ax=ax2)
plt.show() I am still hesitant to make it an argument in the class initialization, as I suspect 0.01 is good enough for most cases, while explaining to users what the parameter does and how to choose a good value would be very involved. |
This sounds very good to me! |
Currently, set circles are free to move in any direction during layout optimization. However, sometimes this additional degree of freedom is not needed, e.g. for example 5 in the docs:
I would be nice to have the option to restrict the optimization so that the circle centers stay on the X axis (or the Y axis). In my opinion, this could look much cleaner in some cases.
I have played around with
EulerDiagram._optimize_layout
and its seems to be enough to just set the Y (or X) component of theorigins
array to zero (in the cost function and after the optimization). (I also tried to introduce a penalty for y values, but that does not lead to a complete axis alignment.)When enabling this, it would also be desirable to place the subset labels on the same axis (x in this case). (Unlike in my example.) And maybe place the set labels on the other axis (top or bottom in my example).
I hope I could get across what I mean...
Once more, it is a pleasure to work with your code!
The text was updated successfully, but these errors were encountered: