Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core refactoring #46

Merged
merged 82 commits into from
Sep 18, 2021
Merged

Core refactoring #46

merged 82 commits into from
Sep 18, 2021

Conversation

tkeskita
Copy link
Owner

core_refactor branch contains work described in #45 to refactor object handling and update system in BVTKNodes. PR created already now to advertise this, although this work will likely take a long time to complete.

This was referenced Apr 8, 2021
@tkeskita tkeskita marked this pull request as draft April 10, 2021 13:50
@tkeskita
Copy link
Owner Author

I've now upgraded most of the core to such an extent that this is ready for comments! The only thing currently working is placing e.g. vtkConeSource and Info node to node tree. Also, only silent updates (no UI updates) work, and only on first run. Clearly much more way to go, but the basics of core and updating are now in place.

core.py Show resolved Hide resolved
@NickGeneva
Copy link

Tuomo,

Another thought I wanted to bring up regarding the new update procedure. Do we know if Blender operates the node editor on a single thread? Or is what you have written thread safe? I'm envisioning a case where two auto-update nodes trigger on two different threads and wonder if they could enter an infinite update loop of sorts where one updates the others upstream causing the other to update and vice-versa.

Any idea? I know this is kinda an edge case but curious what you're thoughts are.

I think this system should work quite well from what I've read. Nice work!

@tkeskita
Copy link
Owner Author

@NickGeneva I have no idea if nodes are threaded. I guess not.

Thanks! I'll continue, slow progress..

core.py Outdated
'''

# For all nodes: Do nothing if some of the inputs are not connected.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a design question: Which input connections should be required to be connected before proceeding to actual update of node? Currently all input connections specified are required, but is that too restrictive? Should it require only inputs names starting with "input"? I think @thomgrand had an example of a node with many optional inputs (can't find it now). For example vtkGlyph3D has two required inputs: input 0, input 1 and SourceTransform as optional input.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My experience is actually that the computations work even without the optional inputs, e.g. the stream tracer
figure

Regarding dynamic inputs:
I think the node(s) you are talking about are the pyvista nodes in the experimental pyvista_features branch. Similar to the VTKAppendFilter, the idea was to allow a variable number of inputs, but I looked for an easy way to distinguish them, since in the scripts you want to perform computations targeted at each mesh. The nodes generate new input connections when connecting nodes and move all nodes into a continuous sequence if inputs are disconnected.
figure

I attached the base class which handles all of this, which you can use for inspiration.
arbitrary_inputs_helper.txt

But a good question is when do you want to show optional sockets? I think it's nice that you can immediately see all options. If optional sockets had another color than the mandatory ones, it would already go a long way.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the best way to do this would be to generate the node classes with the additional information in the m_connections. Kinda related to my suggestion of color coating different output types in #45 , basically figure out a good way of storing more information on input and output connections than just the name.

This could be done through some small data class with attributes such as "name", "type", "optional" that would be stored in the m_connections list instead of strings. From that point we can run any sort of input connection checks for whats required. This would require the generation of the nodes to be change though, and I'm not even sure if we can out which ones are optional since I am unfamiliar with the this part of the code.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think I'll start with a simple rule, assuming that if input name starts with "input" it is assumed to be required, otherwise input is optional. Let's see how far that goes with VTK nodes..

@thomgrand
Copy link

I'm not sure if this fits here, but I think it's a good place to start the discussion:
Another issue that I had recently was the caching. Currently each computation caches the result at the node and only releases the resources before computing the node again. When building large graphs with multiple converter nodes, you will run into memory issues and no possibility to release the cached resources except for closing and re-opening Blender, which will start with an empty cache again.
The caching is nice usually, since it keeps results for forked nodes of potentially large graphs and allows you to easily debug using the Info node, but I feel there should be extended control to release or keep the resource.

Imagine loading large files from the disk. Currently, the file is loaded each time, which is fine if you have e.g. changing data, but sometimes you would like to keep the results cached, especially for heavy computations (e.g. Stream Tracer).
On the other hand, when executing a complete graph (e.g. using Update All), I usually don't want to keep anything cached.

I'm not sure how this could be addressed properly, one could think of a flag on each node (manually, or automatically activated when connecting to an Info node), or a specialized Cache Node that handles everything. What's your opinion?

@NickGeneva
Copy link

@thomgrand Based on my limited understanding of the VTK pipeline (correct me if im wrong), I'm assuming that the actual data is passed in the the vtk object on the vtkobj.Update() call? However, I think this update call isn't required at every node necessarily. If we dont all an explicit update VTK will just push the data through the vtk object and not store it (thinking about this stack post).

An example of this is this VTK example:

   streamers = vtk.vtkStreamTracer()
   streamers.SetInputConnection(reader.GetOutputPort())
    ...
    streamers.Update() # Call update because we want to use the output data
    scalarRange[0] = streamers.GetOutput().GetPointData().GetScalars().GetRange()[0]
    scalarRange[1] = streamers.GetOutput().GetPointData().GetScalars().GetRange()[1]
    print("range: ", scalarRange[0], ", ", scalarRange[1])

    tubes = vtk.vtkTubeFilter()
    tubes.SetInputConnection(streamers.GetOutputPort())
    ....
    tubes.SetVaryRadius(0)
   # No update here because we will let the downstream handle it.

    streamerMapper = vtk.vtkPolyDataMapper()
    streamerMapper.SetInputConnection(tubes.GetOutputPort())
    ...

    streamerActor = vtk.vtkActor()
    streamerActor.SetMapper(streamerMapper)

This was one of the reasons I tried to write a custom VTK algo for that custom node I was making. Because the pipeline could control the update.

So perhaps, this is something that perhaps should not be a cache functionality, but rather control over where the .Update() is called in the pipeline (i.e. where we want to get an actual workable object)? Or perhaps a custom storage node is needed that forces the collection of an vtk data from upstream. Am I understanding this correctly?

@tkeskita
Copy link
Owner Author

@thomgrand do you mean that there is accumulation of unused VTK objects in memory causing out-of-memory? I'm unsure what kind of scenario you mean by multiple converter nodes.. In core_refactor the VTK object is allocated only once upon node creation, and to be released upon node deletion. At least there should be no multiple allocation of VTK objects per node.

The update procedure in core_refactor is such that when a node is up-to-date, it becomes immediately out-of-date only when some value in node is changed or input connection is changed. Otherwise node remains up-to-date and will not be updated. You can choose logic of updating out-of-date nodes by changing the Update Mode property in Inspect panel. Does it solve this problem?

@NickGeneva definitely the call to vtk_obj.Update() should be as seldom as possible (only when something has changed and update is requested). In core_refactor vtk_obj.Update() is called only when required by the user (depending on Update Mode chosen). If call to vtk_obj.Update() is really not required for a some VTK class, then it can be made a special node with apply_properties_post(). I'm changing apply_properties() so that it will not call vtk_obj.Update() if apply_properties_post() exists.

@thomgrand
Copy link

thomgrand commented Apr 28, 2021

I'm also no expert, but here's how I understood it.

@thomgrand Based on my limited understanding of the VTK pipeline (correct me if im wrong), I'm assuming that the actual data is passed in the the vtk object on the vtkobj.Update() call?

This depends on the setup of the actual node in question. Many nodes of VTK are of type vtkAlgorithm. They keep their output, even if their only called by another node's Update if the releaseDataFlag is set to off. I don't know what's the default value, but I guess it's false.
https://vtk.org/Wiki/VTK/FAQ#Using_ReleaseDataFlag

By default VTK keeps a copy of all intermediate results between filters in a pipeline. For a pipeline with five filters this can result in having six copies of the data in memory at once. This can be controlled using ReleaseDataFlag and GlobalReleaseDataFlag. If ReleaseDataFlag is set to one on a data object, then once a filter has finished using that data object, it will release its memory. Likewise, if GlobalReleaseDataFlag is set on ANY data object, all data objects will release their memory once their dependent filter has finished executing.

If you enable it, I'm not sure if you can directly get their output, which would make automatic array extraction in e.g. Color Mapper impossible.

This would also mean that deleting the VTK object in the BVTKcache probably has no effect, unless you also delete the vtkAlgorithm itself. (I think this is actually done currently)

So perhaps, this is something that perhaps should not be a cache functionality, but rather control over where the .Update() is called in the pipeline (i.e. where we want to get an actual workable object)? Or perhaps a custom storage node is needed that forces the collection of an vtk data from upstream. Am I understanding this correctly?

The problem is that this does not only depend on the user, but some nodes seem to need a VTK object on their input to work correctly (e.g. Info and Color Mapper). I'm not yet sure where the program should store these, or the vtkAlgorithms.

@thomgrand do you mean that there is accumulation of unused VTK objects in memory causing out-of-memory? I'm unsure what kind of scenario you mean by multiple converter nodes.. In core_refactor the VTK object is allocated only once upon node creation, and to be released upon node deletion. At least there should be no multiple allocation of VTK objects per node.
The update procedure in core_refactor is such that when a node is up-to-date, it becomes immediately out-of-date only when some value in node is changed or input connection is changed. Otherwise node remains up-to-date and will not be updated. You can choose logic of updating out-of-date nodes by changing the Update Mode property in Inspect panel. Does it solve this problem?

There are indeed no multiple allocations of vtk objects. I'm talking about that they are never freed.

Imagine having two large meshes that you want to process using BVTK. You create two node "pipelines"
A -> B -> C
D -> E -> F
where A, D are the file loaders, C, F are the converters and B, E are some filters in between. After executing both C and F, all vtk objects were allocated, the data was loaded in and every node has its output cached. In case of a large dataset, this can already fill up your memory quite well.
But I see no way the user can now easily empty these cached objects and get back the memory. If you press Update on either C or F, the pipeline caches will be emptied and (iirc) all vtk objects will be newly allocated, freeing the memory before consuming it again.

On the other hand, if the output of A is very large, while B filters the data down to a small size (or if B takes very long to compute), it could be nice if one could tell BVTK to use the cached results of B to try out subsequent filters (e.g. only Glyph visualization where we play around with the scale).

For the large meshes, I currently restart Blender once I converted all large meshes, but it would be nice to have some (limited) control over this behavior inside the actual node tree.

@NickGeneva
Copy link

NickGeneva commented Apr 29, 2021

This depends on the setup of the actual node in question. Many nodes of VTK are of type vtkAlgorithm. They keep their output, even if their only called by another node's Update if the releaseDataFlag is set to off. I don't know what's the default value, but I guess it's false.

Ahh, I see. Yes this makes sense. Indeed this would be tricky even if we were to shut it off and stop calling update since some nodes need a vtkobj to update its contents.

Right now, pre-refactor, maybe a special node would be a good method as this would allow the user to kinda define this pseudo root node for all down stream objects. Just holds a vtkobj with the data in it. This would also be very nice for tinkering with downstream items.

But post refactor, maybe the flags can be used? The problem is that if we release all data automatically we're back to computing the whole graph over and over. Some maybe the special cache/root node would still be a good addition. For nodes that need vtkobj info to draw they would have to be written to store the needed info locally (with the new update status this would be possible and easier since we can now have events for input updates).

I'm assuming to get right of the data arrays in the algo objects we would call ReleaseData() on the output. Maybe this is something to address post refactor when we have better event handling on node updating?

@tkeskita
Copy link
Owner Author

Well, I think core_refactor could be modified so that it's not necessary to create VTK objects until actual update. That way it could be possible to write an operator which executes each node island separately, and then releases those VTK objects, before continuing with the next island? Or implement an Update Mode which discards data on node islands other than the one last modified.

@thomgrand
Copy link

thomgrand commented Apr 29, 2021

I think calling ReleaseData() has some potential caveats: The object itself is maybe still referenced by other parts of the program and from my experience with vtk, you might run into a segfault if any other object tries to access the data (unconfirmed). The clean way would be to delete all references to the vtkobj across all nodes.

I'm unsure if currently it's necessary to keep both all the algorithms/filters, as well as their outputs alive. I think it'd be fine if the default BVTK caching stays as it is (compute each node and cache each output associated to the node), but the following options would be really nice to have:
Auto empty cache (bool): Only nodes that explicitly require a vtkobj on their input will have their predecessor cached. All other nodes will have their associated vtkAlgorithm and output destroyed after the queried update has been finished. Not earlier since we may re-use the results of some nodes. (Global flag somewhere activated)
Empty Cache (action/operator): Will simply empty the current cache and delete all vtkAlgorithm instances to get the resources back.
Cache Node (bool): If the node has a cached object, its predecessors and the nodes itself will not be updated, but rather the cached output will be forwarded. The node has some color indication or symbol that indicates it's using the cache. Maybe an option similar to Edit Custom Code in the Properties Tab could be implemented

Some of these are probably harder to implement than others, but the empty cache would already be very helpful for large datasets.

@tkeskita
Copy link
Owner Author

tkeskita commented May 1, 2021

FYI now I've come to a point where current enum lists are blocking the way, so I'm next gonna try out the paired text property + enum selection operator mentioned in second post of #45 and comment in #54. Idea is that enum property value is stored in a StringProperty, with a small button next to it, which launches an enum list generator. Gonna try this out with color mapper node..

@tkeskita
Copy link
Owner Author

tkeskita commented May 9, 2021

Note: I'm currently busy with other things, progress is even slower than normal.

@tkeskita
Copy link
Owner Author

tkeskita commented May 15, 2021

Today's commit includes a working prototype of alternative enum list approach for Color By in Color Mapper node. Next trying to finetune this approach. Edit: Since this affects only dynamic enum lists, I suppose it is acceptable to create a custom update function like color_by_set_value() for dynamic enums lists? So I propose to use this for dynamic enum lists.
image

@tkeskita
Copy link
Owner Author

I haven't found a property for changing node title bar color, so currently using Node color.

@tkeskita
Copy link
Owner Author

tkeskita commented Jun 7, 2021

OK, I'm now somewhat confident that this system will work. I added optional function get_vtk_output_object_special() which special nodes (like Multi Block Leaf) can use to provide their VTK output data object, if they don't use some normal vtkAlgorithm. Please check it out, does it provide enough flexibility at least for most cases?

Note: I was thinking to next upgrade Time Selector, and after that try to rebase to current master 😬. From there on, there should be no changes to master before core refactoring has been merged! After this base upgrade the door is open for further modifications.

@NickGeneva
Copy link

I haven't found a property for changing node title bar color, so currently using Node color.

I did a bit of digging and you seem to be correct, at least right now the label color appears to be hard coded and locked out of the python API as of right now.

An alternative could be to use the icon of the node label to communicate the VTK status. Seems this was implemented in Blender version 2.83. But the general node color works fine imo.

Perhaps the icons can be used denote node types like readers, writers, filters, converters. 🤔

tkeskita added 7 commits June 13, 2021 14:30
- Modified vtkCache and added nodeCache to BVTKCache
  for direct mapping between nodes and VTK objects.
  Access by BVTKCache.get_node_from_vtkobj() and
  BVTKCache.get_vtkobj_from_node().
- Renamed node_status to vtk_status in BVTKNode
  and added enumerations with descriptions.
- Currently broken, work in progress.
- Added nomenclature to __init__.py, trying to harmonize naming
- vtk_obj pointer in node (don't know yet if this works)
- initialize_vtk() to create VTK object
- new set of socket-node-VTK navigation functions
- WIP: pretty much everything is now broken
- Documenting failed attempt to get object pointer to node:
- Adding property "vtk_obj: object" can be done but the value is lost
  and can't be accessed via node access later on.
- Using PointerProperty requires usage of Blender types, didn't get
  this to work either. Looks like cache.py is the easiest solution.
- Lots of reversals from previous failed attempt commits,
  and streamlining cache for updates in core.
- Restored node_id property to BVTK node class, to facilitate
  mapping of nodes to VTK objects held by BVTKCache.
- cache: Modified rebuild logic to reuse node_id if possible.
- Calling Update() added to node code.
  Node is to cover it's own initialization and update code fully.
- First ideas for node updating and change propagation:
  update_vtk() is to update upstream nodes and current node.
  notify_downstream() is to signal change downstream.
@tkeskita
Copy link
Owner Author

OK getting close to merging point. Next I'll upgrade the test cases, do some final cleaning, kick version numbers and then merge with master!

- Fix: Check for not vtk_obj in get_vtk_array_data().
- Replaced run_update_all() with check_node_statuses()
  to check that all nodes are up-to-date.
- Upgraded test json and blend files.
- Kicked up version number.
- Added ideas for future development from core_refactor.
- Removed obsolete code.
@tkeskita tkeskita marked this pull request as ready for review September 18, 2021 11:57
@tkeskita tkeskita merged commit 770fc00 into master Sep 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants