-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up and revamp functionality for dumping / exporting graphs. #308
Comments
Thanks for opening this issue and writing down your thoughts, as well as offering to devote some time to fix this problem. Reading through your comments, it seems that you would like to move away from graphviz and rather go down the xml road. This is opposite to what I was thinking: I would rather double down on the graphviz approach and abandon the xml approach for the following reason:
In contrast, graphviz would bring the following benefits to the table:
However, the one benefit of the rvsdg viewer was that it was tailored towards the RVSDG and had a few features that I did not find in other graphviz viewers so far (but I have also not looked very hard):
One additional argument for improving the graphviz support in jlm is that the RVSDG is not the only data structure that would benefit from this. We are currently using graphviz for the following data structures:
Thus, I would rather invest in graphviz than coming up with our own format again. RVSDG Viewing SupportAs I see it, we have currently two ways of viewing the RVSDG:
What I would like to do is to replace the xml format with graphviz + of-the-shelf viewer(s) in the next step. Further down the road, I would like to have a viewer that is RVSDG tailored with (some of the) features from the current RVSDG viewer. I fully agree with you that this should be a website similar to compiler explorer, i.e., one can upload an RVSDG graphviz, get it rendered, and can interact with it for debugging purposes etc. What do you think? |
You bring up interesting points, and I am definitely not a big XML fan. When I was looking into debug output earlier, I tried to emit some graphviz, but I had the following issues:
I did however not look into other ways of viewing GraphViz files, the file format itself seems quite extensible, and |
I found graphviz to be quite usable for the HLS dialect in combination with a customized version of xdot.py. The main thing that would be convenient for larger graphs is collapsible structural nodes. For the old XML viewer I found it convenient that it included addresses of nodes (though I changed those to hex) and inputs and outputs (though only in the XML), since that helped with finding where exactly in the graph errors occurred. |
I created a proof of concept of a generalized dot output based on what I have described above. |
Ultimately, my goal would be to have a replacement for the rvsdg-viewer, but I currently do not find the choice of viewer the most pressing issue. In my opinion, there is quite some ground to cover in jlm before we should discuss the viewer question. What I envisioned is the following:
Point 1 and 2 above are the minimum in order to replace the string concatenation mess that we have all over the place in jlm. This would already go a long way to improve the situation. Once this is in place, we can tackle the viewer question. It might be that there are already viewers out there that can already do what we would like, but I do not know (yet). My personal wish would have been a viewer on a website where we/everyone can just provide an rvsdg dot file, it renders the graph, and one can interact with it. I am not sure what is out there for doing this (or if we have to build our own), but d3-graphviz seems like a good starting point. |
I can see why you would want a clean and generic data model for dot graphs, especially when targeting it from different areas of the compiler. |
@haved : What is your opinion to the comments above? |
I think using GraphViz as the main output format makes sense, and the examples shown by @davidmetz show that there are practical ways of encoding ports for inputs and outputs. It is also cool if the structural nodes show which inputs and outputs are passed in and out of subregions. A viewer should absolutely be able to minimize subregions, though. I'm my experience from trying to make some readable graphs, axis aligned edges with 90° turns are the easiest to read, if laid out to not overlap. When looking at a region, I'm not sure it makes for the most readable graph to have all arguments in a line at the top, and could imagine some floating markers to reference arguments anywhere, but that could be an optional thing in custom viewer software. Once I have time, the first thing I'll do is look at the dot files @davidmetz used to get his graphs. I agree that we should have a programmatic way of building the data model, and not string concat. |
@haved Did you find any time to tackle this issue yet? |
I have been experimenting with programmatically rendering into svgs from JavaScript for a viewer, although I realize that might not be the best order to go about things. The next time I'm working on this I'll be trying to make a C++ API and data model for graph outputs, supporting subregions and ports. I think we agree on how such an API should look. |
While designing an API and the GraphViz format the output eventually ends up as, I have had to consider how regions should be handled. One option is to use subgraphs/clusters, which "flattens" the hierarchical graph, and draws borders around regions to indicate region membership. Nodes can not contain subgraphs, so structural nodes become subgraphs, possibly with sub-subgraphs for each region, with ports added in a way different to simple nodes. To preserve a sense of hierarchy, one can apply a scale to make deeper nesting smaller. The nice thing about this approach is that you can display the entire program using just I have also started to consider emitting multiple named graphs in the file, and use metadata to link structural nodes, which become actual GraphViz nodes, to regions. The reason I started thinking about several named graphs in the first place was that the user could like to view the RVSDG together with some other graph, like the PointsToGraph, so it would make sense to output them together in the same file, and let the viewer then highlight sematic connections across graphs (e.g. registers referring to specific outputs). This question is of course not really that important in terms of the API design of the graph output, and both can be options for the final output, I just wanted to share my thoughts. |
The rvsdg viewer is able to show multiple regions at once. One can click on a structural node and expand/hide its content, and it will be shown together with the parent region. This works quite nicely, enabling the user to hide the subgraphs from structural nodes that are uninteresting and only drill down into the nodes that are of interest. I have to say that I do not really have a strong opinion at the moment without having worked with either of the options. My gut feeling tells me that the multiple-graph option might be more future proof, but that is just a feeling. The reason why I am saying this is that it would/should work better with bigger graphs. For smaller graphs, we still have the ascii output that does the job reasonably well. So a dot graph is not strongly needed there (but would certainly be nice to have). Once the graph becomes bigger, one would like to have some possibility to hide uninteresting subgraphs from the users view, and then some kind of viewer support is needed anyway. That being said, we currently do not have the possibility to emit RVSDG dot graphs at all, so either option is a step forward. The dot representation of an RVSDG is not set in stone. If it turns out that the option we chose does not do the trick, then we can still iterate on it easily as long as the programmatic model supports the possibilities we want to utilize from dot. |
We have the start of an MLIR backend, and it seems like there is support for generating dot graphs from MLIR dialects I don't know how the generated dot looks like, and we might want additional functionality or visualize things differently. Just wanted to throw it out there. It would of course limit the graphs to the IR itself and not have support for, e.g., the points to graph. |
Interesting, I would like to see an example of output so I'll try to either find it or make one. MLIR supports both regions and control flow within them, while we get away with only regions, so it will be interesting to see how they handle the extra distinction that we don't have to make. |
@haved Did you find time to make any progress on this front? |
@phate I've been doing this today, will probably "finish" it tomorrow, and let you give your feedback while I resume on the Andersen solver using the work list, which is also pretty much finished. EDIT: okay so finish tomorrow was a bit optimistic (after the first 90% come the last 90%), will come back with a PR soon™ |
…g as ASCII or dot (#403) This is the API I ended up with, it uses some `friend` declarations to ensure internal consistency, so the public API should be hard to misuse. You create **graphs**, where you can add **nodes** and **edges**. These can all have labels and arbitrary attributes set, which will be included in the dot output. The ASCII output ignores attributes for brevity. There are also **ArgumentNode**, **ResultNode** and **InOutNode** that are nice when representing data flowing into the graph, through nodes, and out of the graph. InOutNodes have an arbitrary number of input ports and output ports, which can also have labels and attributes. They can also have subgraphs. The InOutNodes use the HTML table labels in dot. In dot, subgraphs are rendered as separate graphs, so I was planning on customizing a JavaScript library for GraphViz to make graphs inside nodes render nicely when requested. The ArgumentNodes and ResultNodes have the option of being connected to ports in outside graphs. Otherwise, edges must go between ports within the same graph. All GraphElements have the option of being mapped to a program object (any type of pointer), which serves two purposes: You can look up elements by program object, to aid in building the graphs, and you can refer to program objects by pointer in attributes, and have the pointer be replaced by the short unique id of the graph element. I'm very open to suggestions for changes to the interface, internals or the output, as this should be something usable for everybody. There is currently a helper function for setting the background color on a node/port. I could envision a lot more helper functions, to make it easier for the end user to find useful and correct attributes, e.g. ``` edge.SetAttribute("style", "dashed"); // becomes edge.SetStyle(EdgeStyle::Dashed); ``` The code takes care to escape strings in both regular dot and the HTML sub-language. I will of course add tests to all the functionality that it makes sense to test. Under follows some examples of API usage, and the corresponding output in ASCII and rendered GraphViz: <details> <summary>A regular graph</summary> ```cpp jlm::util::GraphWriter writer; auto & graph = writer.CreateGraph(); graph.SetAttribute("rankdir", "LR"); auto & node = graph.CreateNode(); node.SetLabel("DELTA"); node.SetAttribute("shape", "oval"); auto & node2 = graph.CreateNode(); node2.SetLabel("ALLOCA"); auto & edge = graph.CreateDirectedEdge(node, node2); edge.SetAttribute("style", "dashed"); edge.SetAttribute("minlen", "3"); edge.SetAttribute("headlabel", "Pointee"); edge.SetAttribute("taillabel", "Pointer"); auto & node3 = graph.CreateNode(); node3.SetLabel("ALLOCA2"); auto & edge2 = graph.CreateUndirectedEdge(node2, node3); edge2.SetLabel("Unified"); ``` becomes ``` { node0:DELTA node1:ALLOCA<-[node0, node2] node2:ALLOCA2<-node1 } ``` and ![image](https://github.com/phate/jlm/assets/3748845/822ebfdc-7c4d-46a8-bc41-63a1756f6b40) </details> <details> <summary>A dataflow graph</summary> ```cpp jlm::util::GraphWriter writer; auto & graph = writer.CreateGraph(); auto & arg0 = graph.CreateArgumentNode(); arg0.SetLabel("CTX"); arg0.SetFillColor("yellow"); auto & res0 = graph.CreateResultNode(); auto & node0 = graph.CreateInOutNode(0, 1); node0.SetLabel("\"Cool Name!\""); node0.SetFillColor("gray"); auto & node1 = graph.CreateInOutNode(3, 3); node1.SetLabel("<Main Node>"); node1.GetOutputPort(1).SetFillColor("yellow"); auto & node2 = graph.CreateInOutNode(2, 1); node2.SetLabel("End Node"); graph.CreateDirectedEdge(arg0, node1.GetInputPort(2)); graph.CreateDirectedEdge(node0.GetOutputPort(0), node1.GetInputPort(1)); graph.CreateDirectedEdge(node0.GetOutputPort(0), node1.GetInputPort(2)); graph.CreateDirectedEdge(node1.GetOutputPort(0), node2.GetInputPort(0)); graph.CreateDirectedEdge(node1.GetOutputPort(1), node2.GetInputPort(1)); graph.CreateDirectedEdge(node2.GetOutputPort(0), res0); ``` becomes ``` { ARG a0 o0 := "\"Cool Name!\"" o1, o2, o3 := "<Main Node>" [], o0, [a0, o0] o4 := "End Node" o1, o2 RES o4 } ``` and ![image](https://github.com/phate/jlm/assets/3748845/bbeabe51-4be9-49cf-befb-a08557885d15) </details> <details> <summary>A graph with a subgraph</summary> ```cpp jlm::util::GraphWriter writer; auto & graph = writer.CreateGraph(); auto & source = graph.CreateNode(); auto & node = graph.CreateInOutNode(3, 3); graph.CreateDirectedEdge(source, node.GetInputPort(0)); graph.CreateDirectedEdge(node.GetOutputPort(0), node.GetInputPort(2)); auto & subgraph = node.CreateSubgraph(); auto & arg0 = subgraph.CreateArgumentNode(); auto & arg1 = subgraph.CreateArgumentNode(); arg0.SetOutsideSource(node.GetInputPort(0)); arg1.SetOutsideSource(node.GetInputPort(2)); auto & res0 = subgraph.CreateResultNode(); auto & res1 = subgraph.CreateResultNode(); res0.SetOutsideDestination(node.GetOutputPort(2)); res1.SetOutsideDestination(node.GetOutputPort(1)); subgraph.CreateDirectedEdge(arg0, res1); subgraph.CreateDirectedEdge(arg1, res0); ``` becomes ``` { node0:NODE o0, o1, o2 := NODE node0, [], o0 { ARG a0 <= node0, a1 <= o0 RES a1 => o2, a0 => o1 } } ``` and ![image](https://github.com/phate/jlm/assets/3748845/85d2e8c8-ad52-49f1-ad73-2e59d17ee3a6) </details> This PR is part of #308 --------- Co-authored-by: Nico Reissmann <[email protected]>
Apologies, but I realized I had mis-read the dot language spec. Unlike in HTML/XML, all attribute values must always be quoted. This PR fixes this. It is part of #308 Co-authored-by: Nico Reissmann <[email protected]>
As discussed in #297, adding overloads for passing maps around manually is not a good solution for increasing observability, though was helpful in the short run.
What would an ideal interface for dumping internal state look like? I'm currently aware of the following functions:
rvsdg::view()
functions implemented injlm/rvsdg/view.cpp
, which takes any region and gives SSA outputs that give names to all outputs (and arguments) and shows their uses:jlm-opt
's--xml
flag, and corresponding xml export, defined injlm/rvsdg/view.cpp
. The xml can be opened in the rvsdg-viewer. The xml does not include information about arguments being mapped to the structural node's inputs, but we could add that.PointsToGraph::ToDot
which creates a Graphviz output of the PointsToGraph, that can be rendered withdot
.What would be nice to have?
The reason I made #297 was to be able to connect
RegisterNode
s to corresponding outputs in the dumped RVSDG. The same could also be done toAllocaNodes
,LambdaNodes
etc.The more information we add to the PointsToGraph output, and the larger the program, the less pleasant it becomes in GraphViz, and it makes more sense to have interactive viewer software, so I think adding an xml output option for the PointsToGraph with all this information could make sense.
When the files are all xml, we can even place the different xml outputs in the same .xml file, allowing associated outputs to be located in the same file, reducing the possibility of confusing version mismatch between files.
What would an improvement look like?
In practice I think
ToDot
can be removed from thePointsToGraph
class, makingPointsToGraph.cpp
more focused on logic.I was thinking having a general class with helpers for emitting output, that can be configured to be either xml, dot or a simple textual format. My previous attempts at using dot for rvsdg were not successful, so we might consider removing it once there is great viewer software for the xml output.
This class could also include mappings between pointers to more human readable unique ids, and the mappings would stay across invocations. When using viewer software, these ids are not that important, but when using textual output, they are. I'm guessing traversal order is not consistent across invocations of the compiler, due to all the unordered_maps, so the mappings would still not make the output "stable".
My attempts at a WebViewer
I have experimented a bit with an html output, where the output itself is pretty much just the xml (and could in practice be the same as xml), and the "magic" happens in JavaScript. The website it produces lets me move nodes around, scale them, resize and zoom into regions and move nodes in there etc. The UX is however quite unpleasant, since there is no auto-layout, and too many ways of doing the same thing, and mouse clicking starts breaking down in weird ways when zooming too far into nested regions. There should also be a way of hiding edges through named "teleporters" or something.
I would very much like to retry the web-viewer, however, possibly using svg internally instead. It is very low-effort to open html files, but even more importantly, it would be extremely nice to have a compiler explorer-like website for jlc and jlm-opt. Including the option of showing PointsToGraph information would also be nice.
Planning ahead
I would be open to do work on this after I'm done with university courses this year (Dec 19), and will of course base my design around any consensus established in this issue.
The text was updated successfully, but these errors were encountered: