Deserialization/Storage of StackGraphs #114
-
I really like the the serialization implementation for StackGraphs. It enables the visualization implementation which is extremely useful. However, for storage the serialization isn't really useful without a deserialization implementation. Yet in a quote from the blog post:
Since StackGraphs are stored for each file, is the point of the the Path stitching to connect precomputed file-based paths with external references? Is that the reasoning behind only storing StackGraphs for files, and then merging them into another StackGraph? Is it not possible to modify select files within a StackGraph and then, in reference to the previous StackGraph, efficiently recompute paths? I guess storing paths has a large worst case space complexity, but in practice methods/functions are often only referred to once or twice with a small minority being used widely throughout the repository. I'm a little confused on how are those StackGraphs stored and loaded, and I would love to do the same. For example, If I wanted to compute affected paths when comparing a PR on a branch, by 1) loading previous (repo-wide) stack graph for that branch, 2) checking affected files (or even lines) from a git diff on the PR, 3) recomputing paths and nodes for only those files/lines affected, and then 4) store a new StackGraph (or even just storing affected paths in relation to a higher-level branch), would it be possible? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 5 replies
-
Good questions @maxwnewcomer! What you are asking about is the incrementality of the system: if one file changes, how much work has to be redone? At the minimum the stack graph for that file has to be recomputed, as you correctly said. But what about those paths? If complete (ref to def) paths were computed in the full graph (all files combined), it is a difficult problem to determine which paths were invalidated, which failed attempts (in the previous graph) may now succeed, etc. Instead of doing that, the path finding algorithm can be split up in two phases. One precomputes partial paths per file, and if the file doesn't change, neither do these. The second phase is stitching these partial paths together to form complete paths, which is done based on all partial paths from all files. But because the partial paths are limited to useful paths (starting at a reference, starting at a definition) this is more efficient that searching complete paths in the full graph from scratch. Your next question is about storing and loading this information. The library currently only serializes the graph and complete paths, because these are necessary for visualization. Being able to load a stack graph would be a first step. Being able to store and load partial paths would be a next step. These partial paths would need to be loaded in with This could be something to contribute to the library. If you want to pick this up, I would suggest to start with deserialization of stack graphs, as the serialization part is already there. As a follow up you could do serialization & deserialization for partial paths. |
Beta Was this translation helpful? Give feedback.
-
I see good value in being able to serialize and deserialize a stack-graph. @hendrikvanantwerpen am i right in understanding that the existing I would like to give this a go; writing a |
Beta Was this translation helpful? Give feedback.
-
Note that currently deserialization of stack graphs is currently supported and built-in so implementing deserialization yourself is no longer needed. This requires using the |
Beta Was this translation helpful? Give feedback.
Good questions @maxwnewcomer!
What you are asking about is the incrementality of the system: if one file changes, how much work has to be redone? At the minimum the stack graph for that file has to be recomputed, as you correctly said. But what about those paths? If complete (ref to def) paths were computed in the full graph (all files combined), it is a difficult problem to determine which paths were invalidated, which failed attempts (in the previous graph) may now succeed, etc. Instead of doing that, the path finding algorithm can be split up in two phases. One precomputes partial paths per file, and if the file doesn't change, neither do these. The second phase is stitching these part…