Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up and revamp functionality for dumping / exporting graphs. #308

Open
haved opened this issue Dec 6, 2023 · 16 comments
Open

Clean up and revamp functionality for dumping / exporting graphs. #308

haved opened this issue Dec 6, 2023 · 16 comments

Comments

@haved
Copy link
Collaborator

haved commented Dec 6, 2023

As discussed in #297, adding overloads for passing maps around manually is not a good solution for increasing observability, though was helpful in the short run.

What would an ideal interface for dumping internal state look like? I'm currently aware of the following functions:

  • The rvsdg::view() functions implemented in jlm/rvsdg/view.cpp, which takes any region and gives SSA outputs that give names to all outputs (and arguments) and shows their uses:
[]{
  o0 := DELTA[MyList] 
    []{
      o1 := ConstantPointerNull 
    }[o1]
  o2 := LAMBDA[next] o0 
    [a3, a4, a5, a6 <= o0]{
      o7 := BITS32(0) 
      o8 := BITS32(4) 
      o9 o10 := ALLOCA[ptr] o8 
      o11 := MemStateMerge o10 a4 
      o12 o13 := LOAD a6 o11 
      o14 := STORE o9 o12 o13 
      o15 o16 := LOAD o9 o14 
      o17 := GetElementPtr o15 o7 o7 
      o18 o19 := LOAD o17 o16 
      o20 := STORE o9 o18 o19 
      o21 o22 := LOAD o9 o20 
    }[o21, a3, o22, a5]
}[o0, o2]
  • jlm-opt's --xml flag, and corresponding xml export, defined in jlm/rvsdg/view.cpp. The xml can be opened in the rvsdg-viewer. The xml does not include information about arguments being mapped to the structural node's inputs, but we could add that.

  • PointsToGraph::ToDot which creates a Graphviz output of the PointsToGraph, that can be rendered with dot.

What would be nice to have?

The reason I made #297 was to be able to connect RegisterNodes to corresponding outputs in the dumped RVSDG. The same could also be done to AllocaNodes, LambdaNodes etc.

The more information we add to the PointsToGraph output, and the larger the program, the less pleasant it becomes in GraphViz, and it makes more sense to have interactive viewer software, so I think adding an xml output option for the PointsToGraph with all this information could make sense.

When the files are all xml, we can even place the different xml outputs in the same .xml file, allowing associated outputs to be located in the same file, reducing the possibility of confusing version mismatch between files.

What would an improvement look like?

In practice I think ToDot can be removed from the PointsToGraph class, making PointsToGraph.cpp more focused on logic.

I was thinking having a general class with helpers for emitting output, that can be configured to be either xml, dot or a simple textual format. My previous attempts at using dot for rvsdg were not successful, so we might consider removing it once there is great viewer software for the xml output.

This class could also include mappings between pointers to more human readable unique ids, and the mappings would stay across invocations. When using viewer software, these ids are not that important, but when using textual output, they are. I'm guessing traversal order is not consistent across invocations of the compiler, due to all the unordered_maps, so the mappings would still not make the output "stable".

My attempts at a WebViewer

I have experimented a bit with an html output, where the output itself is pretty much just the xml (and could in practice be the same as xml), and the "magic" happens in JavaScript. The website it produces lets me move nodes around, scale them, resize and zoom into regions and move nodes in there etc. The UX is however quite unpleasant, since there is no auto-layout, and too many ways of doing the same thing, and mouse clicking starts breaking down in weird ways when zooming too far into nested regions. There should also be a way of hiding edges through named "teleporters" or something.

I would very much like to retry the web-viewer, however, possibly using svg internally instead. It is very low-effort to open html files, but even more importantly, it would be extremely nice to have a compiler explorer-like website for jlc and jlm-opt. Including the option of showing PointsToGraph information would also be nice.

Planning ahead

I would be open to do work on this after I'm done with university courses this year (Dec 19), and will of course base my design around any consensus established in this issue.

@phate
Copy link
Owner

phate commented Dec 7, 2023

Thanks for opening this issue and writing down your thoughts, as well as offering to devote some time to fix this problem.

Reading through your comments, it seems that you would like to move away from graphviz and rather go down the xml road. This is opposite to what I was thinking: I would rather double down on the graphviz approach and abandon the xml approach for the following reason:

  1. The RVSDG xml format is homegrown and only supported by the RVSDG viewer
  2. The RVSDG viewer has not been updated since years and depends (by now) on outdated libraries.
  3. It requires engineering time to keep the RVSDG viewer up to date. Engineering time is the one resource I do not have sufficiently.

In contrast, graphviz would bring the following benefits to the table:

  1. It is a widely used, supported, and well documented graph format.
  2. There exist plenty of viewers that work out of the box and I do not need to devote any time to.
  3. There exist plenty of conversion tools to other formats (pdf, ps, ....)

However, the one benefit of the rvsdg viewer was that it was tailored towards the RVSDG and had a few features that I did not find in other graphviz viewers so far (but I have also not looked very hard):

  1. Marking edges in different colors in the UI. This was extremely useful for debugging.
  2. Removing the contents of structural nodes from view. This allowed to focus only on the parts of the graphs that were relevant
  3. Providing a tree view of the RVSDG, which provided one with the nesting structure of the graph at a glance.
  4. A few other minor things

One additional argument for improving the graphviz support in jlm is that the RVSDG is not the only data structure that would benefit from this. We are currently using graphviz for the following data structures:

  1. The RVSDG HLS dialect for firrtl
  2. The inter-procedural graph
  3. The Control Flow Graph
  4. The Points-To Graph
  5. The disjoint set in Steensgaard
  6. The Aggregation tree could benefit from a graphviz serialization
  7. The Constraint graph in Andersen could maybe benefit from a graphviz implementation as well.

Thus, I would rather invest in graphviz than coming up with our own format again.

RVSDG Viewing Support

As I see it, we have currently two ways of viewing the RVSDG:

  1. As ascii in the terminal. This is only useful for very small graphs, but helps a lot for unit tests.
  2. As xml in the rvsdg viewer. This works for medium to big graphs due to the rvsdg viewer's features.

What I would like to do is to replace the xml format with graphviz + of-the-shelf viewer(s) in the next step. Further down the road, I would like to have a viewer that is RVSDG tailored with (some of the) features from the current RVSDG viewer. I fully agree with you that this should be a website similar to compiler explorer, i.e., one can upload an RVSDG graphviz, get it rendered, and can interact with it for debugging purposes etc.

What do you think?

@haved
Copy link
Collaborator Author

haved commented Dec 7, 2023

You bring up interesting points, and I am definitely not a big XML fan.

When I was looking into debug output earlier, I tried to emit some graphviz, but I had the following issues:

  • I was unable to find a nice way of indexing inputs and outputs to nodes.
  • The drawings produced by dot -Tsvg very quickly got very ugly, and almost impossible to read. I have later found that the circo engine at least makes most graphs more readable.
  • I wasn't sure how best to draw subregions

I did however not look into other ways of viewing GraphViz files, the file format itself seems quite extensible, and dot will happily ignore parameters it doesn't know about, so a lot of extra information can be encoded if a purpose-built viewer needs it. Making outputs that are syntactically valid GraphViz lets us open them in lots of viewer software, but also put any custom stuff in if needed, so I can definitely appreciate that approach.

@davidmetz
Copy link
Collaborator

I found graphviz to be quite usable for the HLS dialect in combination with a customized version of xdot.py.
image
It should be possible to generalize the HLS specific output to normal RVSDG, though at the moment I have hard-coded some things like specific types of nodes being at the top and bottom of sub-regions.
A more general version should probably use nodes for arguments and results of structural nodes, which was not required for HLS since they are mostly symbolic.
xdot.py is an interactive viewer that by default includes panning, zooming and searching and is quite easy to extend.
I for example extended it to support loading waveform files so I can step through simulation cycles in a visual way, while seeing the states of different edges (shape and thickness) as well as content (mouse-over).
I also added the functionality to click an edge to give it a highlighted color. By default xdot.py marks nodes and edges red while hovering over them.

The main thing that would be convenient for larger graphs is collapsible structural nodes.
This should be possible to implement as well, but would be a bit more complex, since it would require modifying and re-layouting the graph.

For the old XML viewer I found it convenient that it included addresses of nodes (though I changed those to hex) and inputs and outputs (though only in the XML), since that helped with finding where exactly in the graph errors occurred.

@davidmetz
Copy link
Collaborator

I created a proof of concept of a generalized dot output based on what I have described above.
It can be found under https://github.com/davidmetz/jlm-public/tree/dot and used by calling jlm-opt --dot
Some things are not ideal yet, like structural outputs that have no associated results not appearing at the bottom, but they could be fixed using invisible edges.
test.dot.zip
image

@phate
Copy link
Owner

phate commented Dec 9, 2023

Ultimately, my goal would be to have a replacement for the rvsdg-viewer, but I currently do not find the choice of viewer the most pressing issue. In my opinion, there is quite some ground to cover in jlm before we should discuss the viewer question. What I envisioned is the following:

  1. Implement a proper dot data model in jlm with which it is possible to create dot graphs programmatically(!). The current "string concatenation approach" is just horrible. This would be a clean approach to create dot graphs for any graph structure we have in jlm and also allow for unit tests.
  2. A pretty printer that takes a dot graph model and converts it to a string and/or writes it to a file.
  3. A dot builder as a (thin) layer above the data model that helps to build dot graphs. (This might not be necessary, it depends)

Point 1 and 2 above are the minimum in order to replace the string concatenation mess that we have all over the place in jlm. This would already go a long way to improve the situation.

Once this is in place, we can tackle the viewer question. It might be that there are already viewers out there that can already do what we would like, but I do not know (yet). My personal wish would have been a viewer on a website where we/everyone can just provide an rvsdg dot file, it renders the graph, and one can interact with it. I am not sure what is out there for doing this (or if we have to build our own), but d3-graphviz seems like a good starting point.

@davidmetz
Copy link
Collaborator

I can see why you would want a clean and generic data model for dot graphs, especially when targeting it from different areas of the compiler.
Based on my experience with the format I think it would make sense to include ports of nodes that edges can connect to and clusters that can group nodes together.
In graphviz multiple nodes can also be explicitly assigned a rank (source, sink, same), which in the example above is used to make arguments and results appear consistently at the top/bottom. Since sink and source are properties of individual nodes they should be easier to model than same, which relates nodes to each other.

@phate
Copy link
Owner

phate commented Dec 16, 2023

@haved : What is your opinion to the comments above?

@haved
Copy link
Collaborator Author

haved commented Dec 16, 2023

I think using GraphViz as the main output format makes sense, and the examples shown by @davidmetz show that there are practical ways of encoding ports for inputs and outputs. It is also cool if the structural nodes show which inputs and outputs are passed in and out of subregions. A viewer should absolutely be able to minimize subregions, though. I'm my experience from trying to make some readable graphs, axis aligned edges with 90° turns are the easiest to read, if laid out to not overlap.

When looking at a region, I'm not sure it makes for the most readable graph to have all arguments in a line at the top, and could imagine some floating markers to reference arguments anywhere, but that could be an optional thing in custom viewer software.

Once I have time, the first thing I'll do is look at the dot files @davidmetz used to get his graphs. I agree that we should have a programmatic way of building the data model, and not string concat.

@phate
Copy link
Owner

phate commented Jan 2, 2024

@haved Did you find any time to tackle this issue yet?

@haved
Copy link
Collaborator Author

haved commented Jan 2, 2024

I have been experimenting with programmatically rendering into svgs from JavaScript for a viewer, although I realize that might not be the best order to go about things. The next time I'm working on this I'll be trying to make a C++ API and data model for graph outputs, supporting subregions and ports. I think we agree on how such an API should look.

@haved
Copy link
Collaborator Author

haved commented Jan 9, 2024

While designing an API and the GraphViz format the output eventually ends up as, I have had to consider how regions should be handled.

One option is to use subgraphs/clusters, which "flattens" the hierarchical graph, and draws borders around regions to indicate region membership. Nodes can not contain subgraphs, so structural nodes become subgraphs, possibly with sub-subgraphs for each region, with ports added in a way different to simple nodes. To preserve a sense of hierarchy, one can apply a scale to make deeper nesting smaller. The nice thing about this approach is that you can display the entire program using just dot, or other GraphViz viewers.

I have also started to consider emitting multiple named graphs in the file, and use metadata to link structural nodes, which become actual GraphViz nodes, to regions. dot only renders the first graph in the file, so you instead rely on a viewer if you want to view multiple levels deep at once. Otherwise you only get one region at once, like the rvsdg-viewer software. I still think this option might be better in the long run, as it allows simple and structural nodes to look the same, boxes representing regions, possibly with a picture of the subregion rendered into the box, and with options for resizing the box if you want a better view. Otherwise you can dig into regions when desired, and minimize them if they are in the way.

The reason I started thinking about several named graphs in the first place was that the user could like to view the RVSDG together with some other graph, like the PointsToGraph, so it would make sense to output them together in the same file, and let the viewer then highlight sematic connections across graphs (e.g. registers referring to specific outputs).

This question is of course not really that important in terms of the API design of the graph output, and both can be options for the final output, I just wanted to share my thoughts.

@phate
Copy link
Owner

phate commented Jan 12, 2024

Otherwise you only get one region at once, like the rvsdg-viewer software.

The rvsdg viewer is able to show multiple regions at once. One can click on a structural node and expand/hide its content, and it will be shown together with the parent region. This works quite nicely, enabling the user to hide the subgraphs from structural nodes that are uninteresting and only drill down into the nodes that are of interest.

I have to say that I do not really have a strong opinion at the moment without having worked with either of the options. My gut feeling tells me that the multiple-graph option might be more future proof, but that is just a feeling. The reason why I am saying this is that it would/should work better with bigger graphs. For smaller graphs, we still have the ascii output that does the job reasonably well. So a dot graph is not strongly needed there (but would certainly be nice to have). Once the graph becomes bigger, one would like to have some possibility to hide uninteresting subgraphs from the users view, and then some kind of viewer support is needed anyway.

That being said, we currently do not have the possibility to emit RVSDG dot graphs at all, so either option is a step forward. The dot representation of an RVSDG is not set in stone. If it turns out that the option we chose does not do the trick, then we can still iterate on it easily as long as the programmatic model supports the possibilities we want to utilize from dot.

@sjalander
Copy link
Collaborator

We have the start of an MLIR backend, and it seems like there is support for generating dot graphs from MLIR dialects
https://mlir.llvm.org/doxygen/ViewOpGraph_8cpp.html
https://mlir.llvm.org/doxygen/ViewOpGraph_8cpp_source.html
It "Defines interface to produce Graphviz outputs of MLIR op within block.", i.e., a graph per RVSDG region.

I don't know how the generated dot looks like, and we might want additional functionality or visualize things differently. Just wanted to throw it out there. It would of course limit the graphs to the IR itself and not have support for, e.g., the points to graph.

@haved
Copy link
Collaborator Author

haved commented Jan 13, 2024

Interesting, I would like to see an example of output so I'll try to either find it or make one. MLIR supports both regions and control flow within them, while we get away with only regions, so it will be interesting to see how they handle the extra distinction that we don't have to make.

@phate
Copy link
Owner

phate commented Feb 15, 2024

@haved Did you find time to make any progress on this front?

@haved
Copy link
Collaborator Author

haved commented Feb 15, 2024

@phate I've been doing this today, will probably "finish" it tomorrow, and let you give your feedback while I resume on the Andersen solver using the work list, which is also pretty much finished.

EDIT: okay so finish tomorrow was a bit optimistic (after the first 90% come the last 90%), will come back with a PR soon™

phate added a commit that referenced this issue Feb 29, 2024
…g as ASCII or dot (#403)

This is the API I ended up with, it uses some `friend` declarations to
ensure internal consistency, so the public API should be hard to misuse.

You create **graphs**, where you can add **nodes** and **edges**. These
can all have labels and arbitrary attributes set, which will be included
in the dot output. The ASCII output ignores attributes for brevity.

There are also **ArgumentNode**, **ResultNode** and **InOutNode** that
are nice when representing data flowing into the graph, through nodes,
and out of the graph. InOutNodes have an arbitrary number of input ports
and output ports, which can also have labels and attributes. They can
also have subgraphs.

The InOutNodes use the HTML table labels in dot. In dot, subgraphs are
rendered as separate graphs, so I was planning on customizing a
JavaScript library for GraphViz to make graphs inside nodes render
nicely when requested.

The ArgumentNodes and ResultNodes have the option of being connected to
ports in outside graphs. Otherwise, edges must go between ports within
the same graph.

All GraphElements have the option of being mapped to a program object
(any type of pointer), which serves two purposes: You can look up
elements by program object, to aid in building the graphs, and you can
refer to program objects by pointer in attributes, and have the pointer
be replaced by the short unique id of the graph element.

I'm very open to suggestions for changes to the interface, internals or
the output, as this should be something usable for everybody.

There is currently a helper function for setting the background color on
a node/port.
I could envision a lot more helper functions, to make it easier for the
end user to find useful and correct attributes, e.g.
```
edge.SetAttribute("style", "dashed");
// becomes
edge.SetStyle(EdgeStyle::Dashed);
```

The code takes care to escape strings in both regular dot and the HTML
sub-language.
I will of course add tests to all the functionality that it makes sense
to test.

Under follows some examples of API usage, and the corresponding output
in ASCII and rendered GraphViz:

<details>
<summary>A regular graph</summary>

```cpp
  jlm::util::GraphWriter writer;
  auto & graph = writer.CreateGraph();
  graph.SetAttribute("rankdir", "LR");

  auto & node = graph.CreateNode();
  node.SetLabel("DELTA");
  node.SetAttribute("shape", "oval");

  auto & node2 = graph.CreateNode();
  node2.SetLabel("ALLOCA");

  auto & edge = graph.CreateDirectedEdge(node, node2);
  edge.SetAttribute("style", "dashed");
  edge.SetAttribute("minlen", "3");
  edge.SetAttribute("headlabel", "Pointee");
  edge.SetAttribute("taillabel", "Pointer");

  auto & node3 = graph.CreateNode();
  node3.SetLabel("ALLOCA2");
  auto & edge2 = graph.CreateUndirectedEdge(node2, node3);
  edge2.SetLabel("Unified");
```
becomes
```
{
  node0:DELTA
  node1:ALLOCA<-[node0, node2]
  node2:ALLOCA2<-node1
}
```
and

![image](https://github.com/phate/jlm/assets/3748845/822ebfdc-7c4d-46a8-bc41-63a1756f6b40)
</details>

<details>
<summary>A dataflow graph</summary>

```cpp
  jlm::util::GraphWriter writer;
  auto & graph = writer.CreateGraph();
  auto & arg0 = graph.CreateArgumentNode();
  arg0.SetLabel("CTX");
  arg0.SetFillColor("yellow");
  auto & res0 = graph.CreateResultNode();

  auto & node0 = graph.CreateInOutNode(0, 1);
  node0.SetLabel("\"Cool Name!\"");
  node0.SetFillColor("gray");

  auto & node1 = graph.CreateInOutNode(3, 3);
  node1.SetLabel("<Main Node>");
  node1.GetOutputPort(1).SetFillColor("yellow");

  auto & node2 = graph.CreateInOutNode(2, 1);
  node2.SetLabel("End Node");

  graph.CreateDirectedEdge(arg0, node1.GetInputPort(2));

  graph.CreateDirectedEdge(node0.GetOutputPort(0), node1.GetInputPort(1));
  graph.CreateDirectedEdge(node0.GetOutputPort(0), node1.GetInputPort(2));

  graph.CreateDirectedEdge(node1.GetOutputPort(0), node2.GetInputPort(0));
  graph.CreateDirectedEdge(node1.GetOutputPort(1), node2.GetInputPort(1));

  graph.CreateDirectedEdge(node2.GetOutputPort(0), res0);
```
becomes
```
{
  ARG a0
  o0 := "\"Cool Name!\""
  o1, o2, o3 := "<Main Node>" [], o0, [a0, o0]
  o4 := "End Node" o1, o2
  RES o4
}
```
and

![image](https://github.com/phate/jlm/assets/3748845/bbeabe51-4be9-49cf-befb-a08557885d15)
</details>

<details>
<summary>A graph with a subgraph</summary>

```cpp
  jlm::util::GraphWriter writer;
  auto & graph = writer.CreateGraph();

  auto & source = graph.CreateNode();
  auto & node = graph.CreateInOutNode(3, 3);
  graph.CreateDirectedEdge(source, node.GetInputPort(0));
  graph.CreateDirectedEdge(node.GetOutputPort(0), node.GetInputPort(2));

  auto & subgraph = node.CreateSubgraph();
  auto & arg0 = subgraph.CreateArgumentNode();
  auto & arg1 = subgraph.CreateArgumentNode();
  arg0.SetOutsideSource(node.GetInputPort(0));
  arg1.SetOutsideSource(node.GetInputPort(2));

  auto & res0 = subgraph.CreateResultNode();
  auto & res1 = subgraph.CreateResultNode();
  res0.SetOutsideDestination(node.GetOutputPort(2));
  res1.SetOutsideDestination(node.GetOutputPort(1));

  subgraph.CreateDirectedEdge(arg0, res1);
  subgraph.CreateDirectedEdge(arg1, res0);
```
becomes
```
{
  node0:NODE
  o0, o1, o2 := NODE node0, [], o0
    {
      ARG a0 <= node0, a1 <= o0
      RES a1 => o2, a0 => o1
    }
}
```
and

![image](https://github.com/phate/jlm/assets/3748845/85d2e8c8-ad52-49f1-ad73-2e59d17ee3a6)
</details>

This PR is part of #308

---------

Co-authored-by: Nico Reissmann <[email protected]>
phate added a commit that referenced this issue Mar 1, 2024
Apologies, but I realized I had mis-read the dot language spec. Unlike
in HTML/XML, all attribute values must always be quoted. This PR fixes
this. It is part of #308

Co-authored-by: Nico Reissmann <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants