-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for new Annotator API #16
Comments
I strongly support the above proposal. The four goals that Simon outlines above are crucial to making Annotators intuitive to use and allow us to structure the code in a composable way. An Annotator is defined by the dimensions you want to annotate and their dtypes, not by any specific element that shares those dimensions. By declaring this specification ahead of time and having a single annotator support multiple dimensions I believe annotators will be much easier to use and even complex multi-dimensional spaces can be annotated in an intuitive way. Any proposal that relies on an additional object such as the This proposal will also make it possible to annotate multiple plots with shared dimensions, e.g. imagine you have a 1D plot of time and a 2D plot of time and depth. With multiple annotators there was no way to share the time dimension, while this proposal allows for this in a very convenient and natural way. Overall I see the proposal as a generalization of what was already there that only has upsides and zero downsides. Given that I have spent only limited time considering the problem I might have missed something but every angle I've approached this from I find this approach much cleaner both from a usability and a developer/maintenance perspective. |
The proposal:
Here is psuedo code thinking through how the new proposal would look in practice: annotations_df = pd.DataFrame({
'time_start': [10, 30, 60],
'time_end': [20, 40, 70],
'description': ['thing1', 'thing2', 'thing1']
})
curve = hv.Curve((time, eeg_data), kdims=["Time", "Amplitude"])
image = hv.Image((time, eeg_data), kdims=["Time", "Channel"])
connector = SQLiteDB(table_name='something')
spec = {"TIME": np.datetime64, "Amplitude": float}
annotator = Annotator(spec, connector=connector, fields=["description"])
for index, row in annotations_df.iterrows():
annotator.set_range(Time=(row['time_start'], row['time_end']))
annotator.add_annotation(description=row['description'])
annotator.commit()
annotated_curve = annotator * curve
annotated_image = annotator * image
annotated_curve + annotated_image It looks good and simple enough to me! I'm in strong support. The main remaining aspects that I think might need to be considered in the future are how this implementation will lend itself to various UI and styling accommodations, but that's for a later time. |
One thing that I'm not clear on yet both in this proposal and in the current implementation is how you specify whether the annotation along a specific dimension should be expressed as individual points (i.e. HLines/VLines), ranges (i.e. HSpans/VSpans) and/or 2D geometries. |
It looks great to me! No notes. :-) |
In Demetris example, how is this supposed to work? annotated_curve = annotator * curve
annotated_image = annotator * image Assuming ranges, the first one should display as Also, everyone seems to think that Annotators are deeply tied with elements which is simply not true.
This is already the case: there is absolutely minimal use of I would say the core proposal here is that an Annotator can be extended to accept the specs for more than one visual element at once. This does seem reasonable to me, but then I need to know how you actually visualize the annotations over the various elements. I would want to see what the proposed interface is here - if is good, then allowing higher dimension specs when initializing the Rereading @hoxbro's initial comment, I think this is what jumps out to me as problematic (i.e. too uncontrolled and too magic for my taste):
|
That is not the point I was making no, that part is solved by having the annotator infer the appropriate dimensions from the element inside the mul operation. My point was equally valid in the old implementation afaik, i.e. in the 1d case how do you tell it whether the annotation will be a range or a single point, and in the 2d case how do you specify if it'll be a rectangle or a polygon. This is really more of an aside though and can be solved easily enough. I also don't see the strawman, it is true that a single element is tied to an annotator, whether it is used minimally or not that is still part of the design. The parts that are important here are primarily about annotating of shared dimensions and coordinating multiple annotators being awkward and I don't see how these are strawmen in any way. |
This is already solved by setting the
Ok, what is ambiguous here is a HoloViews element instance is not tied to the implementation. If we are clear we are talking about the specs of a single element, then yes, I agree.
Agreed. But this is all dependent on some mechanism whereby annotators change their behavior based on the dimensionality of the thing they are overlaid upon. This is unlike anything we have done in HoloViews before and I'm not sure this is something I am comfortable with. For instance, how would I request just the annotations themselves without an element underneath? |
It's presumably just syntactic sugar over a method that accepts an element or a dimension spec that returns the appropriate overlay. |
Are you suggesting overwriting I guess that would work if people are fine with even more semantics for operator overloading. Ideally users wouldn't think about it or notice what is going on but if the abstraction leaks it could cause more confusion. |
Yes, but again that's more syntactic sugar than anything else and not at the core of the proposal, i.e. even if we decide that's too much magic this proposal makes sense to me. The important part of the proposal is about specifying the full multi-dimensional annotation space so we can avoid coordinating multiple annotators. Do you have any specific concerns about moving in that direction? I can see specific issues that this addresses but I may also be missing new issues that arise due to that change. |
The concern is how to map back from the high dimensional annotator object back to whatever lower dimension thing you are plotting on the screen. You say it is syntactic sugar but I consider this a key thing to understand for this proposal to work well in practice. I admit I hadn't thought of implementing
This does seem like a good idea but only if the API for mapping back to the lower dimensional (sub)spaces is intuitive and easy to use. I think that if done well (e.g helpful error messages when there are name/dtype mismatches), this idea of I'm only spending time thinking about this aspect because the core idea (supporting higher dimensional annotators from the initial constructor) is a good one (and one I'm sure I considered with Simon at some point). I don't have any other objections to this proposal to raise right now. |
As for downsides, one of the original ideas was that you could start with one plot to annotate and later add another joint plot to annotate together as a group. However, that plan didn't quite work out because you need a fixed schema by the time you commit your first annotation: you only had some flexibility in deciding what you were annotating up until that point. In the end, that isn't much flexibility at all so you might as well declare the full dimensionality of what you will be annotating when you create the initial annotator (i.e. this proposal). |
Indeed and in theory this proposal could have a mechanism to expand the spec later but also then would error when committing if the table has already been set up. |
I will also say that I don't like the idea of annotator = Annotator.from_elements(curve, image, fields=["description"]) Most people should not think about the spec as many people will just have an element to annotate. This is also an awkward signature for the common case. I would suggest: Annotator(*spec_or_elements, fields=["description"]) i.e. You can still have a classmethod and use it internally (or explicitly if you really want to) but I don't think that should be the common (and simplest) entrypoint to the system. The classmethod I would want is |
While it is true that @philippjfr If this is the point you were trying to get to, then I do agree that this needs more thought! |
The current way joint specs are represented is something like this i.e. a list of dicts (each corresponds to an element): [{'age':datetime, 'height':float }, {'x':int, 'y':int}] And then the region types are essentially zipped with the spec dictionaries e.g.: ['Range', 'Point'] This is a bit awkward though so let's think of some alternatives! |
For instance, to combine the necessary information in a single [('Range', {'age':datetime, 'height':float }), ('Point', {'x':int, 'y':int})] |
Actually, the more HoloViews-ey way of doing things is probably to compare based on the spec = {'Range':[kdim1, kdim2], 'Point':[kdim1, kdim2]} Where all the dimension instances have their spec = {'Range':curve.kdims, 'Point':image.kdims} I do think this is the proper way to do things long term e.g. a dimension that is Kelvin that disallows negative values in the |
This issue is an alternative to the change proposed in #9 and will only discuss the public API of the
Annotator
class, which will be the hardest to change after the official release of HoloNote.Goals
The goals for improving the
Annotator
class should be:Connector
) should be easily swappable and have only one task, which is to persist the output of the annotator.Current approach
With the current approach, there are three short-comings
Annotator
andConnector
.Annotator
are not well supported. This leads to a complex setup with multipleAnnotators
and a sharedConnector
.Annotator
instances can only handle one element.I have, in issue #6 (commit: c729118), started the process of separating the functionality of
Annotator
andConnector
. These changes are almost feature-complete with the main branch (commit: af0138d). The missing feature is that it is not possible to commit two different elements with differentAnnotator
s as a single commit. The lack of this feature is because the main branch is using theConnector
as the negotiator between the differentAnnotator
s by having methods likeadd_annotation
on it, which is what the PR is actively trying to remove.Currently, this is how it is done on the main branch. The code is split into four steps, which will be referenced later.
My pain with the current implementation is that it is very complex to set up (step B) as it needs one
Connector
and twoAnnotators
with the instance of the connector passed into it. This also uses theconnector
for adding annotations (step D).To handle the separation of concerns, issue #9 proposed a new abstraction layer,
DataSource
:This proposal is a patch solution based on existing code. The approach solves the separation of concern problem, but in doing so, it is even more complicated as you need to know about a third kind of object, which is also a worry described in the issue. It also does not reduce the number of objects.
The current implementation on the main branch also has a strong coupling between an
Annotator
instance and anElement
instance. This seems more fundamental and is not addressed in #9. This is where a lot of the complexity in the setup stems from.New proposal
This new proposal will build upon the work already made in #6, which separates the
Annotator
andConnector
. The first thing would be to remove the strong coupling between an annotator and an element. In practice, this will be done by either having a class method extracting relevant information from an element (or, alternatively, the__init__
method should only extract it). By removing the single-element constraint, we can make theAnnotator
accept specs from multiple elements. This change will mean that step B would look like this:The
spec
dictionary shows the current implementation and could change to incorporate extra information if it is a point or range type.Step C will change so it only accepts keyword arguments:
These changes will also solve the problem currently seen in #6. As step D will just be:
With this change, we only need one object to orchestrate the annotations making for a simpler API. This also means the
Annotator
instance will have all available information and can communicate with the connector instance without a third-party object.Applying annotations to elements should be easy and intuitive if you know HoloViews. Multiplying the
annotator
on an element should automatically detect the element's key dimension(s) and overlay the matching annotations correctly.For this proposal, we will need to rewrite the
Annotator
class to remove the strong coupling of the element and add the proposed features mentioned in this section.The text was updated successfully, but these errors were encountered: