Thoughts about our approach #3

almarklein · 2021-10-17T09:19:32Z

almarklein
Oct 17, 2021
Maintainer

I have been thinking about our goals, and believe that the way we have now formulated them is flawed. It would be good to take a step back, try to define what the problem is that we're trying to solve, and move from there. Below I'll argue why I think that the way we have formulated our approach is problematic, and propose a slightly different approach. If you squeeze your eyes it's still the same, depending on how literal you take the original wording :)

What is the problem we're trying to solve?

In my words: the situation is that we have an ecosystem with many awesome different visualization libraries, that are each good at certain tasks, but none of them provides a complete package for all the visualization needs of a scientist. The problem that this poses for the scientist is that multiple API's must be learned in order to fulfil the different needs. It's also hard to try out different tools because of the cognitive burden to learn a new API.

What is our goal?

We formulated our goal earlier as: To design a new high-level API for scientific visualization, and to implement that API for different backends, including Datoviz, Pygfx & Matplotlib.

This formulation already includes the approach. Perhaps we can phrase it in more general terms: To provide a way for a user to use the different visualization libraries in a way that minimizes the cognitive load of switching between them.

What is our approach to realize this goal?

Some possibilities:

We make an intersection-style unified-plotting-api, such that code written it it always works on all viz-libraries (backends).
We make an union-style unified-plotting-api: an API like (1) but larger, so it includes API that is specific to certain viz-libraries.
We make it easier for a user to switch from one viz-library to another (as needs change), by aligning their individual plotting-apis.
Something else?

The way we've formulated it now seems to point towards (1) or (2). Spoiler: I'm going to argue that (1) and (2) are not a good approach.

Dealing with feature incompatibilities

edit: in this section I assumed the viz-libaries were not restricted to gpu-viz-libaries. If they are, the intersection is much more complete, and we should probably just do that.

One of the biggest difficulties of this project, I think, is how to deal with incompatibilities between the viz-libraries. Each library will have its own set of features and abstractions.

The intersection of that set is quite narrow. If we confine to only the intersection (option (1)), our API has little advantage; from the POV of each viz-ibrary, our API is less capable than the plotting-api of the library itself. This will not help getting traction. Plus one important point of this project was to let users make use of the different strengths of the viz-libraries, because none of them scratches all itches.

Therefore, creating a unified-plotting-api that covers the intersection of features (i.e. works on all backends) is not very useful.

This means that the new unified-plotting-api will provide the user with features that are not always available on all viz-libraries. In other words, there is a chance that the code that a user writes, does not work on certain backends, works on only one specific backend, or even not works at all because no backend covers all the API that is being used (e.g. a pie chart and volume rendering).

This is a problem, since it somewhat defeats the purpose of this project. Earlier, a user would use different tools with different APIs to build different kinds of visualizations. Now there would be one API, but you'd need to learn (or carefully keep track of) which parts of the API can be used where. How are we going to make it clear what parts of the API work where? We can provide a compatibility matrix in the docs, but do we expect users to keep that on the side the whole time?

This also poses a problem when sharing code (e.g. online). If you copy code from StackOverflow, that code may only work on a specific backend (which you may not have installed). More so, if you combine multiple code samples, there is a chance that one sample requires one backend, while another sample requires another, causing the combined code to not work at all.

In short: option (2) would be something that looks like a single API, but it's not: it would be a union of API's, and you'd always have to be aware what viz-library (backend) you intend to run your code on.

Therefore, creating an API that covers the union of features (i.e. may or not work on a certain backend) is also not very useful.

Unique but aligned APIs

I think we may have to drop the idea of the unified-plotting-api (one API with multiple backends), and replace it with with the idea of having multiple aligned plotting-apis (aligned as in made similar).

We could come up with a "proposed API specification", and each viz-library implements its own variant of that (subset/superset). They will be considered individual API's, each with their own docs, although they will have certain parts in common.

Users import an API specific to one viz-library at the top of their code. This makes it clear what viz-libary this code will run on. The GUI will help the user (docs, autocompletion, etc.) in a way specific to that API, and the user will use http://rtd.org/specific-viz-library/high-level-api. The function calls between different API's often look the same, but return values would be objects specific to the viz-library, allowing the user to drop into the lower-level mechanics of that library if more control is needed.

With a bit of willpower, simple viz code can be made to work on different viz-libraries by only changing the import statement. But this is not the main point. The point is that users are able to understand code from different viz libaries quickly, because they use the same constructs (e.g. for interaction callbacks), and similar function calls.

This helps break down the silos between the different viz libraries. A scientist can write high performance code (via e.g. Datoviz or pygfx), create publication quality figures for a paper (via e.g. MPL), and create a visualization for a blog (via e.g. plotly), using API's that are different, but familiar.

Summarizing

I think that the way that we have initially formulated our goal/approach may have been too ambitious. Not because it will be hard to do, but because it will be impossible to do without making concessions that undermine our initial purpose.

We should reconsider the idea of creating a unified-plotting-api with existing viz-libraries as backends, and instead look at creating multiple "aligned" plotting-apis, that have some parts that are equal, use the same familiar constructs, and can still leverage the power of the specific viz-library that they are part of.

There may also be other solutions that I have not thought of.

djhoese · 2021-10-17T19:33:58Z

djhoese
Oct 17, 2021
Maintainer

@almarklein Thanks for writing this up. You bring up some good points, but I'm having trouble distinguishing the differences between your preferred approach and the API approaches you argue against. I'll try to organize this, but I'm not super confident it'll happen...

My Understanding

My understanding of something like a vispy 2.0 was that it is one way of doing things (sure, call it an API) where different "graphics backends" are implemented by meeting a minimal set of requirements. You could call this set of requirements an "aligned" API. These requirements were supposed to be implemented as "primitive" Visuals. These should be things that any library should reasonably be able to implement. Any extra features can be implemented as separate Visuals for better performance, but theoretically any high level Visual can be made from low-level Visuals. This may be hard to implement but it also should limit differences from backend to backend.

I guess this could be thought of as VisPy is the frontend API, the primitive Visuals are the "middleware", and the graphics backends are...the backend API. The backend APIs don't have to be the same or even similar as long as we can implement the "middleware".

Against unique and aligned

A big part of this proposal was to update VisPy even if that is only the front-facing part of the proposal so that the backend libraries can get improved. While the VisPy API might change, the overall concepts are similar to what already exists. Users who are familiar with VisPy shouldn't have to change much about how they do things, but will suddenly get to use newer technologies (vulkan, wgpu).

Users import an API specific to one viz-library at the top of their code. This makes it clear what viz-libary this code will run on.

When you get to the above point it doesn't seem like you are doing things to better the APIs between the libraries. Each library ends up doing its own thing and being slightly different which leaves users in the same spot as if you/we had tried to make a central API. An API that defines how differences are to be handled may work better than one that says "yeah there will be differences". Also for the current libraries being discussed (datoviz, pygfx/wgpu) are we expecting major differences in functionality? They draw stuff/things/Visuals on the screen and they do it efficiently.

The function calls between different API's often look the same, but return values would be objects specific to the viz-library, allowing the user to drop into the lower-level mechanics of that library if more control is needed.

At what point is a similar API just a single API? These "separate" APIs would probably have shared/common code so would that go in VisPy? How much low-level access do you expect a user to need? If this is a shared high-level API then there should be a high-level access point for getting to those low-level things.

Previous work

What you talked about is something the pyviz.org group (https://pyviz.org/) originally attempted to nail down early on, but I can't seem to find the pull requests from back then. The idea was to define the behavior for very high level interfaces of various visualization libraries (ex. a "save" function that can save to an image on disk). If every library had these high level interfaces then it reduces the learning curve for users switching between them.

Found the github repository: https://github.com/pyviz/spec/pulls

Other questions

Are we considering collaboration with other libraries like bokeh? holoviews? Or are we sticking with "graphics" libraries? If so, then should matplotlib be included?

0 replies

almarklein · 2021-10-17T22:04:21Z

almarklein
Oct 17, 2021
Maintainer Author

@djhoese thanks for the reply.

Are we considering collaboration with other libraries like bokeh? holoviews? Or are we sticking with "graphics" libraries? If so, then should matplotlib be included?

IIRC the proposal started out as an API for graphics backends (in particular Datoviz, Pygfx and Vispy.visuals). And I think that if this would be the scope, the option (1) that I mentioned (an API that is the intersection of functionality, so it always works on all backends), is viable, because the overlap is quite large. And also because none have a high-level API yet :)

It seems, however, that along the way the scope has increased, because at some point we included at least Matplotlib. This is when it gets tricky, and then the worries that I expressed apply.

The fact that we're now discussing this is a sign that its not very clear what the problem statement, goals, and scope is :) I

When you get to the above point it doesn't seem like you are doing things to better the APIs between the libraries. Each library ends up doing its own thing and being slightly different which leaves users in the same spot as if you/we had tried to make a central API. An API that defines how differences are to be handled may work better than one that says "yeah there will be differences".

This is a valid point. I've come to believe that there is no perfect solution. But I do believe that individual but similar API's is preferred over a library/API that pretends to expose a single API while in fact its a mix of APIs, parts of which may or may not work on a specific backend. It reminds me of OpenGL ;) You could consider this as an argument to keep the scope confined to graphics APIs.

How much low-level access do you expect a user to need?

All of it :) But seriously, while some users will only use the high level API, most users will want more control at some point. What would that look like? Would they have to drop the high level API completely? With the "aligned APIs" idea, each library can make this transition in their own (relatively smooth) way. Though I can also see this work with a unified API if its constrained to graphics APIs.

What you talked about is something the pyviz.org group (https://pyviz.org/) originally attempted to nail down early on

Oh wow, I did not know this. One major difficulty with such an attempt, I realize now, is that it would be hard to convince devs of other libraries to play along and implement your dictated API :)

6 replies

almarklein Oct 18, 2021
Maintainer Author

I strong disagree with this idea

What idea do you mean exactly?

What I meant to say was that there is a substantial group of users that would want more control as their needs grow or when they grow more accustomed to the tool. We have to think how we will facilitate this. A very obvious way is to let the user "drop down" into the lower-level API. E.g. start using visuals directly instead of using vispy.plot. If you have a unified-plotting-api (targeting multiple backends) then this "dropping down" means that the user will have to choose what viz-library to go with.

djhoese Oct 18, 2021
Maintainer

Yeah sorry, I meant the joking "all of it" idea. More seriously it just sounded to me like you expected users to need to drop down to the low-level in most cases. If the API is designed well then hopefully this is a rare occurrence and the high-level API acts as the "middle man" for the more obvious use cases.

almarklein Oct 18, 2021
Maintainer Author

I guess I'm less convinced that these occurrences are rare. E.g. with matplotlib.pyplot you can use the functional api, but I suspect that many users also work with the (lower level) objects that these functions return, once such users become more experienced. The fact that a user can do this, and doing so in a somewhat smooth way (using the high level api, and the low level objects), is an advantage that we should not underestimate, imo.

djhoese Oct 18, 2021
Maintainer

If by functional you are saying the matlab-style plt.X interface then the MPL devs have long said that this shouldn't be used and new users (not coming from matlab) should use the object-oriented interface which is more pythonic and as you say gives you more control. I never said that a high-level API couldn't be objects or object-oriented. In the matplotlib case, the object oriented API is the high-level API and is designed to give users as much access to the plotting functionality as possible (Figure, Axis, etc). It also ignores the specifics of the matplotlib backends until you get down to the very nitty-gritty details of embedding things in a GUI backend or maybe how fonts are handled (just a guess, not sure this is actually an issue). The MPL object oriented API while having multiple object types still has a very limited set of objects that the basic use cases have to deal with. For example: create a Figure, create one or more Axes, call ax.X for either plot or imshow or one of the other visualization methods, save or show the result. For the most part I don't think users have to deal with Artists or the matplotlib backends directly.

So while you said:

but I suspect that many users also work with the (lower level) objects that these functions return

I'm pointing out that this set of objects are the API and have been what MPL devs suggest for a while now. This is in line with what I said about the high-level API providing access to the "middle man" that will then act on the backend. The low-level access is abstracted away into the high-level interface to the point that each "component" of the visualization is encapsulated into its own type of object.

I would also say this vispy 2.0-style API may not be plotting specific. Just like Visuals are not plotting specific and just like a lot of the visualizations users make with vispy are not on a ticked and labeled set of axes. This is why including matplotlib for the overall Visual API may be a bad idea, but having an object-oriented plotting API similar to matplotlib may not be a bad idea (but may not be the only API).

almarklein Oct 19, 2021
Maintainer Author

I see what you mean. I think I may have had assumptions about our high-level API that were not really based on much.

I would also say this vispy 2.0-style API may not be plotting specific.

True. We call it the "plotting-api" but would not be specific to plotting.

almarklein · 2021-10-18T13:21:11Z

almarklein
Oct 18, 2021
Maintainer Author

Cyrille, Nico and I discussed this this morning. We came up with some definitions (#5) to hopefully help making these discussions less "ambiguous". I updated some of the terms that I used in the original post.

I'll also try to summarize my thoughts here (read #5 first):

If we would make a unified-plotting-api using only gpu-viz-libraries as backends, I think there'd be enough overlap, so that the plotting-api can be the intersection. I think this could work.

If instead we would aim to make a unifed-plotting-api using generic viz-libaries as backends, then the overlap between these backends would be much smaller. If this unified-plotting-api would be an intersection of the features, it would be very limited, and getting traction would be hard, because ... why would anyone use it? If instead this unified-plotting-api would expose the union of features, you get into a situation where depending on what features are used, user code may only work on specific backends.

That last point was (as I found out by thinking it through) the source of an unease gut-feeling that I had, and the main reason for starting this discussion. I don't believe a feature matrix in the docs would help. See more arguments in my initial post.

I'll try to word this in the form of a proposal:

We should probably stick to gpu-viz-libaries (Datoviz, Pygfx, visppy). These still need a plotting-api, and their featureset is/will be close enough that a unified-plotting-api can probably work.
If we'd want to include other viz-libratries (e.g. matplotlib) than I feel quite strongly that a unified-plotting-api cannot be made to work. I made a suggestion to create multiple aligned plotting-apis instead, but I don't see us creating these for e.g. matplotlib and plotly, and also people adopting them. So in the end, I think I'm proposing to stick to gpu-viz-libaries 😄

4 replies

djhoese Oct 18, 2021
Maintainer

Sounds good. Something else which isn't exactly different than the intersection idea but maybe another way of putting it: you should make the API you want (what the user wants) and any "backend" that doesn't fit that isn't a "complete" backend. Don't change the API just because one of the libraries you're working with (pygfx, datoviz) doesn't support something. Change the library to match what is best for the API.

For example, let's say we implement an API that depends on there being lines, meshes, and volumes, but a new library called gpu-extreme(!) comes along that wants to act as a backend and it doesn't have Volumes. It could probably implement enough of the API that it works in most cases, but to be considered a "vispy 2.0" backend or whatever, it would need to implement Volumes or depend on one of the other backends to provide that functionality when needed.

I'm not sure how this would work in the real world, but it is going to happen. A contributor is going to come along and want to add a feature that only works for a subset of the backends, but is clearly a benefit to the entire API. The incompatible library is either going to need the functionality added, or the API is going to need to require the user to access the backend library directly to use it (this is annoying if it happens alot), or the API becomes one that works for some but not all existing backends.

almarklein Oct 18, 2021
Maintainer Author

This is a good point. I agree that certain features may be exposed even if not all backends support them. But I would add the word "yet" - only if we can reasonably expect all backends to support that feature in the foreseeable future. In other words, I argue that a feature being not fully supported should be a temporary situation.

but a new library called gpu-extreme(!) comes along that wants to act as a backend and it doesn't have Volumes.

I suppose we could call this an experimental backend.

A contributor is going to come along and want to add a feature that only works for a subset of the backends, but is clearly a benefit to the entire API.

I feel quite strongly that this is where we should draw the line. If this happens, I'd recommend that contributor to create a new library in which to expose these features. Somewhat of a backend-specific extension to vispy2. This way the boundaries stay clear.

djhoese Oct 18, 2021
Maintainer

In other words, I argue that a feature being not fully supported should be a temporary situation.

Temporary in open source can be a long time 😉 which may end up looking like the API supports backends not fully supporting every feature. Not disagreeing, just pointing out the blurry lines.

almarklein Oct 19, 2021
Maintainer Author

Ouch! It's true and it hurts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thoughts about our approach #3

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 10 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Thoughts about our approach #3

almarklein Oct 17, 2021 Maintainer

What is the problem we're trying to solve?

What is our goal?

What is our approach to realize this goal?

Dealing with feature incompatibilities

Unique but aligned APIs

Summarizing

Replies: 3 comments · 10 replies

djhoese Oct 17, 2021 Maintainer

My Understanding

Against unique and aligned

Previous work

Other questions

almarklein Oct 17, 2021 Maintainer Author

almarklein Oct 18, 2021 Maintainer Author

djhoese Oct 18, 2021 Maintainer

almarklein Oct 18, 2021 Maintainer Author

djhoese Oct 18, 2021 Maintainer

almarklein Oct 19, 2021 Maintainer Author

almarklein Oct 18, 2021 Maintainer Author

djhoese Oct 18, 2021 Maintainer

almarklein Oct 18, 2021 Maintainer Author

djhoese Oct 18, 2021 Maintainer

almarklein Oct 19, 2021 Maintainer Author

almarklein
Oct 17, 2021
Maintainer

Replies: 3 comments 10 replies

djhoese
Oct 17, 2021
Maintainer

almarklein
Oct 17, 2021
Maintainer Author

almarklein Oct 18, 2021
Maintainer Author

djhoese Oct 18, 2021
Maintainer

almarklein Oct 18, 2021
Maintainer Author

djhoese Oct 18, 2021
Maintainer

almarklein Oct 19, 2021
Maintainer Author

almarklein
Oct 18, 2021
Maintainer Author

djhoese Oct 18, 2021
Maintainer

almarklein Oct 18, 2021
Maintainer Author

djhoese Oct 18, 2021
Maintainer

almarklein Oct 19, 2021
Maintainer Author