fud2 Improvements Summer 2024 Lab Notebook #2113
Replies: 11 comments 17 replies
-
Supporting mapping multiple files to multiple files: This presents a couple challenges:
Thoughts and Proposed Solutions:(1) and (2) are mainly UI challenges "solved" by making the nth specified state correspond to the nth specified file. If there are more files than states, the extra files' states are inferred. If there are more states than files, the missing files are read from stdin/written to standard out. I'm still very much figuring through how to deal with (3) and (4). Current Questions I'm Working On:
Current Progress:
Current Next Steps:
|
Beta Was this translation helpful? Give feedback.
-
Spent some more time thinking and talked with Adrian. I think the problem is more about communicating the ambiguity to the user and less about trying to resolve it automatically or something like that. In particular, the main challenge is (4), generalizing --through. The case of finding a path is easy, if there is ambiguity it can be resolved with a single list of choices. It isn't really different when looking at a hyperpath, we just choose the input edge we care about and actually can specify it almost the same I think. The almost is because now we can have multiple of a single state (imagine an op uses two .futil files). To fix this, define ops and states in the graph as the ops and states they are plus the history of inputs leading to that state or op. This does lead to a huge graph but if it is constructed lazily it shouldn't matter because we aren't searching it all as we can resolve ambiguity locally at an op (like by asking the user, or reading from a script, or just arbitrarily). So yeah, maybe that'll work. |
Beta Was this translation helpful? Give feedback.
-
Constraints on Paths through the State/Op GraphI'm currently going with a slightly stronger constraint then that suggested by @sampsyo: "only a single file can be in a given state." For example, if there is a state representing a calyx This constraint implies every op will be used at most once. (If it were to be used more than once then it would require a single state to have two different files associated with it). Therefore, we can keep specifying This solves (3) and (4) by just removing them as possibilities. This constraint is enforced by the algorithm looking for paths and at the time of specifying ops. |
Beta Was this translation helpful? Give feedback.
-
Little Hack Solution For Finding Plans(also update for the week, I forgot to do it yesturday) New constrain, each input of Represent the hypergraph as a bipartite graph of ops and states. find the minimum (well it won't be a minimum, but it will be small and be a DAG which is the important part) subgraph of ops and states fulfilling the constraints:
Intuitively, this is the graph of all states which can be created from a given input set. To actually find this subgraph, dfs from the input nodes (for ops with more than one input, once a dfs visits from a given input, it marks that as seen. once all inputs are seen, the dfs can progress through that op node). This subgraph might not exist (constraint (1) doesn't hold). If it doesn't then there is no possible solution. If it does exist, great! To retrieve the solution "plan" from this subgraph, for each output add the associated If we can find the desired output great, move on to the next one. If we don't find the desired output, there doesn't exist a working path. Limitations:Currently this doesn't support going to a state more than once. It might be possible to change by letting the searches visit nodes twice and just kind of choose inputs arbitrarily preferring newer inputs when available. That sort of solves the problem, but I do think something more principled (maybe based around the program synthesis thing we discussed earlier) would be better in the long run. I don't like this algos arbitrary choices (e.g. of file to use when multiple are available, or of path to an output) in spread out places without obvious reason to the user. ImplementationThe actual implementation I am going with probably won't explicitly find the subgraph. Instead it will just check if an op will be gotten to from an input as needed. This might be slower, but I don't think it should really matter because our graphs are state/op graphs are small (quadratic or cubic time or whatever it ends up being is fine). I also rewrote the "emit ninja" section to work with new plans. It doesn't play nice with rhai yet though and is almost certainly broken as it is very untested. So yeah, not pr ready yet... |
Beta Was this translation helpful? Give feedback.
-
Continuing a discussion from Slack, because this long rambly thinking didn't fit there… Here is roughly what I am thinking about in the "better DSL for fud2 rules" department. If ops are functions, they should look a lot more like functions. Very sketchily, it would be nice if, instead of this (in current Rhai):
We wrote something more like:
There are a few things going on even in this very simple example:
Taking that last bullet further, we definitely want a way to abstract out setup stuff so they can be used in multiple ops. So let's tweak that imaginary example one step farther:
That of course exposes some type weirdness, but that is not the point I was trying to make here. The point is that the non-op function A related frustration in the current "DSL" is the implicit dependencies on "resources" (see the
Note the implicit dependency on
The problem is that every single build command that uses
And merely calling Maybe this gives some grist for a deeper discussion about what an "actually good DSL" might look like? |
Beta Was this translation helpful? Give feedback.
-
Update on GoalsSo the "Little Hack Solution" is bad enough I don't think it is super worthwhile going through with. Implementing was educational (and might take things from it later) but I think I'll let #2134 get stale for a bit (and possibly close it). Current goal is to read some more about program synthesis and then specify in isolation and solve this "path finding question" and possibly understand it's relation to build systems (The more I'm thinking about it, I feel like this build system connection actually isn't that useful the think about, though maybe interesting. Like thinking about the "categorization of paths that go through a certain op" is kind of analogous to having to find paths which go through a "dirty" file. idk). The current approach @sampsyo wrote a little about somewhere but repeating it here very concretely means
The DSL is probably going to go on hold right now. Currently progress is the hacky thing seems to work in it's super constrained way, but it looks backwards compatible, and also some reading about egraphs (egg and egglog) and program synthesis using petri nets and smt solvers. Both seem promising. |
Beta Was this translation helpful? Give feedback.
-
On Finding Plans (sequence of ops taking inputs to outputs)#2134 merged! This will followup with a simple "enumeration of all plans" solution and ways to specify ops with multiple inputs and outputs. Past that, I spent a bit more time reading. The "APIs" Plan
The last bullet is kind of vague. That's intentional, still not totally clear how to quantify the "goodness" of an algorithm (for example, performance doesn't work because once a pretty low threshold is hit, it doesn't matter to much). The DSLThere are three things in
Here I think using Rhai still would be nice because some of @sgpthomas's work can be reused and then the scripting comes for free. The current plan
The above two things should probably be separate prs. |
Beta Was this translation helpful? Give feedback.
-
UpdateA functional draft of the dsl has been implemented! See #2203. Not much else, but yeah.
|
Beta Was this translation helpful? Give feedback.
-
UpdateMore work on #2203 and a little more program synthesis reading. Progress exceedingly slow for no particular reason. |
Beta Was this translation helpful? Give feedback.
-
Using Egg to Optimize an Enumerate Search for ProgramsThis is copy pasted from zulip but I think is worth putting here as an update. Problem Statement You are tasked with finding a sequence of ops which takes some initial set of states to a new set of states with the constaint that there is a set of ops which must be used in this sequence. By being "used" in this sequence, I mean the op must be present in the sequence AND it's outputs, if they were removed, would make construction of the final set of outputs impossible. If states are thought about as types, then this problem redueces to creating a function out of ops, an API of sorts, which satisfies a given type signature. This problem is researched, but generally in the context of large APIs (1000s of functions and types). fud2 is different in that doesn't have this many ops and states (I think a couple hundred is probably a reasonable, future-proof bound) but cares more about the structure of the result: it must have a set of required ops (and bonus points if there are other ways to tweek the output). It is certainly possible to implement these solutions and they would probably work well with some small changes to be more particular about the structure of the resulting programs. They are complicated to implement though and often solve harder problems than needed. Enumerative Solution Encoding using Egg Ops can then be described as rewrites to this string. For example, if op 1 takes state 1 to state 2, it would do the following to the above string: (c x c) (x c) => (c c c) (c c). State 1 is still kept around as ops do not consume states. Egg can be used to encode these rewrites on these string. The desired set of states and ops can then be encoded to a string and searched for. Egg support functionality for retrieving the set of rewrites used to turn an initial string into a final string. This can be directly translated into sequence of ops. Optimizing Egg Solution Further UpdatesTesting is in progress along with documentation (a short tutorial creating example op exists, but the actual additions need to be documented more precisely). |
Beta Was this translation helpful? Give feedback.
-
Summer 2024 WrapupThe main contributions are a working syntax (and backend for that syntax) for writing I think the big three important but unfinished parts of this project are:
Plans: I plan to punt 1 into the future. The current methods for finding plans aren't ideal, but are functional. I think the goal will be to tackle 2 and then 3, probably having 2 inform 3 a bit (and possibly working on 3 while doing 2 as needed). Using As the generated ninja doesn't depend on the config file or input/output file names, verifying the new ninja on a single test is identical to the old ninja is sufficient for noting the agreement of the migration. This is easy to check by running the generated ninja with As a timeline for when this migration will get done, I don't really have one. The hope was it would be done by now, but that is not the case. I'll try best effort to get 2 and 3 done and more generally support |
Beta Was this translation helpful? Give feedback.
-
It's a lab notebook
Beta Was this translation helpful? Give feedback.
All reactions