fud2 Improvements Summer 2024 Lab Notebook #2113

jku20 · 2024-06-06T16:35:20Z

jku20
Jun 6, 2024
Collaborator

It's a lab notebook

jku20 · 2024-06-06T18:34:48Z

jku20
Jun 6, 2024
Collaborator Author

Supporting mapping multiple files to multiple files:
Currently fud2 works by mapping one file to another. It'd be nice if it could map multiple files to multiple others. See #1958 for more information.

This presents a couple challenges:

now many input states and output states have to be specified along with many files
some subset of these states may have files associated with them and other read from stdin/written to stdout
transitions can potentially take in multiple identical states while not necessarily being symmetric
it is useful to take in a set of ops using --through and always try to use these ops, but now fud2 finds set of hyperpaths (maybe), so what the argument should do is unclear

Thoughts and Proposed Solutions:

(1) and (2) are mainly UI challenges "solved" by making the nth specified state correspond to the nth specified file. If there are more files than states, the extra files' states are inferred. If there are more states than files, the missing files are read from stdin/written to standard out.

I'm still very much figuring through how to deal with (3) and (4).
(3) and (4) both deal with resolving ambiguity when the chain of dependencies we are using. I'm pretty sure finding a nice way to resolve this ambiguity (and actually caring about how it is resolved at all) is what differentiates this problem from just being a normal build system or some of the other applications of hypergraphs. Still not totally sure though.

Current Questions I'm Working On:

get a more concrete understanding of how current build systems get a plan of what to run
- answer the question: is there a good way to add constraints to these methods which force certain plans to be generated
read some more about hypergraphs
- answer the question: does a similar problem exist where there are these "interchangeable dependencies" where the right one has to be chosen

Current Progress:

changed CLI parsing to do the thing I described (though that isn't very meaningful before doing more on actually finding paths, if they are even still paths)
progress (though not much) towards changing the middle bit of fud2 before (3) and (4) make themselves unavoidable

Current Next Steps:

figure out a way to deal with (3) and (4)
implement!

0 replies

jku20 · 2024-06-07T01:12:28Z

jku20
Jun 7, 2024
Collaborator Author

Spent some more time thinking and talked with Adrian. I think the problem is more about communicating the ambiguity to the user and less about trying to resolve it automatically or something like that.

In particular, the main challenge is (4), generalizing --through. The case of finding a path is easy, if there is ambiguity it can be resolved with a single list of choices.

It isn't really different when looking at a hyperpath, we just choose the input edge we care about and actually can specify it almost the same I think.

The almost is because now we can have multiple of a single state (imagine an op uses two .futil files). To fix this, define ops and states in the graph as the ops and states they are plus the history of inputs leading to that state or op. This does lead to a huge graph but if it is constructed lazily it shouldn't matter because we aren't searching it all as we can resolve ambiguity locally at an op (like by asking the user, or reading from a script, or just arbitrarily). So yeah, maybe that'll work.

1 reply

sampsyo Jun 7, 2024
Maintainer

This sounds about right to me! Thanks for summarizing. It also seems like we can probably defer the case where an op has multiple inputs from the same state—I can't think of a case where we actually need that (probably because of the inherent ambiguity it represents).

jku20 · 2024-06-10T18:27:07Z

jku20
Jun 10, 2024
Collaborator Author

Constraints on Paths through the State/Op Graph

I'm currently going with a slightly stronger constraint then that suggested by @sampsyo: "only a single file can be in a given state." For example, if there is a state representing a calyx .futil file, there cannot be two inputs (or even intermediate states) a.futil and b.futil.

This constraint implies every op will be used at most once. (If it were to be used more than once then it would require a single state to have two different files associated with it). Therefore, we can keep specifying --through as a list of op names just like before the change!

This solves (3) and (4) by just removing them as possibilities.

This constraint is enforced by the algorithm looking for paths and at the time of specifying ops.

4 replies

sampsyo Jun 12, 2024
Maintainer

Sorry for the lag here, but just wanted to say I endorse this strengthened restriction: it makes a lot of sense to keep the "spec" for the search problem simple, and it does not seem to be an important limitation in practice. Excellent!

jku20 Jun 12, 2024
Collaborator Author

Actually, this restriction breaks at least one current op (thanks CI) which has a self loop, so I'm currently trying to figure out a way to relax it.

I talked to Nathanial who gave a pretty good reason to having loops (which implies multiple files for a given state). That is, it lets for ops be reused. For example, if a op takes in a calyx file and outputs a calyx file, this output then can be used with all the other ops which take in a calyx file instead of having to define nearly identical ops to processes these "almost calyx files".

Of course allowing loops isn't the only solution. Something like letting states implement "traits" and letting ops also take in and output "traits" solves this. States then start sounding a lot like types with these ops as functions between types (then files themselves are the data). Is this the core of what designing the dsl is about?. Currently living in that mindset, trying to figure out if I can use this instead of thinking about everything as a graph.

sampsyo Jun 12, 2024
Maintainer

Huh, I absolutely did not think of self-loops. Good point.

Your type-based thinking could be a promising way out of this. One other thing I'll point out, though: it seems like a reasonable trade-off to say that, if you want self-loops (or to otherwise reuse a state), that has to come from some kind of explicit signaling about the path you want (currently the --through mechanism). Like, it seems reasonable to say that—because we are looking for shortest paths—we will never automatically find a path that contains a cycle, because removing that cycle always yields a valid path that is strictly shorter. So it seems like a reasonable UI trade-off to require extra hand-holding when you want to use these cases.

I don't know if this solves the immediate problem, of course.

States then start sounding a lot like types with these ops as functions between types (then files themselves are the data). Is this the core of what designing the dsl is about?

It was not something I had previously thought of focusing on, but it seems important and I think it is definitely worth thinking about. I think it's related to something that's very much already happening with our current fud2 setup, which is that there are many "variants" of states that are kinda wonky and exist to constrain which paths are legal. For example, we have:

verilog
verilog-refmem
verilog-noverify
verilog-refmem-noverify

These all exist for a reason. The refmem variants exist because they need extra treatment to provide a testbench before they are usable. The noverify variants exist because Icarus Verilog cannot use the assertions that the Calyx compiler emits by default (while Verilator can). But they are all, of course, actually just Verilog code! So some things you want to do with Verilog could hypothetically work on all of them (not that we have such an example right now). So there is some wisdom in re-conceiving states to be something more flexible than they are now…

sampsyo Jun 12, 2024
Maintainer

Just one more level of thoughts about the interaction of this surprisingly-tricky feature and "the DSL": one way to think about the problem we're currently running into (self-loops aside) is this. Ops are functions, states are types. A plan is a program. So searching for a plan is type-directed program synthesis.

Like, you can imagine that a user's request looks like a sketch of this sort:

def build(in1: state1, in2: state2) -> state3, state4:
  # program synthesizer, please write me a function body that type-checks!

And our job is to produce:

def build(in1: state1, in2: state2) -> state3, state4:
  foo = some_op(in1, in2)
  bar = another_op(foo, in2)
  baz = yet_another_op(foo, bar)
  return bar, baz

Fortunately, the programs we're trying to synthesize are always straight-line (no control flow). But this maybe emphasize why it's sort of a hard problem, and why it's only going to get harder if the types get more complicated. Which makes me think that it would be a really good idea to—if we can—decouple the two complexifying factors of "add fancier types so we don't need so many first-order types" and "allow functions to have multiple inputs and multiple outputs"?

jku20 · 2024-06-14T17:24:45Z

jku20
Jun 14, 2024
Collaborator Author

Little Hack Solution For Finding Plans

(also update for the week, I forgot to do it yesturday)
The algorithm that failed the CI is actually more fundamentally broken, but it isn't a problem with the constraints on the problem, just kind of me being dumb (it can easily miss ops passed with --through because it doesn't explicitly look for it, just looks non-exhaustively at paths and if it sees ones with the wanted op, chooses that). Going to put some time into an actual more sophisticated looking at the thing discussed above, but before that rewriting with the following hack-y algo:

New constrain, each input of --through is associated with a single output.

Represent the hypergraph as a bipartite graph of ops and states.

find the minimum (well it won't be a minimum, but it will be small and be a DAG which is the important part) subgraph of ops and states fulfilling the constraints:

outputs and required ops are in contained in the subgraph
all ops in the subgraph have their inputs in the subgraph
all states except inputs have an op creating them in the subgraph

Intuitively, this is the graph of all states which can be created from a given input set.

To actually find this subgraph, dfs from the input nodes (for ops with more than one input, once a dfs visits from a given input, it marks that as seen. once all inputs are seen, the dfs can progress through that op node).

This subgraph might not exist (constraint (1) doesn't hold). If it doesn't then there is no possible solution. If it does exist, great! To retrieve the solution "plan" from this subgraph, for each output add the associated --through op (and all ops needed to get there) to the solution. Then search from that to the output. Add all ops on that path (and also all ops those ops need) to the plan.

If we can find the desired output great, move on to the next one. If we don't find the desired output, there doesn't exist a working path.

Limitations:

Currently this doesn't support going to a state more than once. It might be possible to change by letting the searches visit nodes twice and just kind of choose inputs arbitrarily preferring newer inputs when available.

That sort of solves the problem, but I do think something more principled (maybe based around the program synthesis thing we discussed earlier) would be better in the long run. I don't like this algos arbitrary choices (e.g. of file to use when multiple are available, or of path to an output) in spread out places without obvious reason to the user.

Implementation

The actual implementation I am going with probably won't explicitly find the subgraph. Instead it will just check if an op will be gotten to from an input as needed. This might be slower, but I don't think it should really matter because our graphs are state/op graphs are small (quadratic or cubic time or whatever it ends up being is fine).

I also rewrote the "emit ninja" section to work with new plans. It doesn't play nice with rhai yet though and is almost certainly broken as it is very untested. So yeah, not pr ready yet...

4 replies

sampsyo Jun 19, 2024
Maintainer

Great; thanks for writing this out. First, I think that your "spec" for what counts as a legal subgraph sounds right to me (those three points in the enumerated list). It doesn't say anything about optimality, which is OK; that seems a lot harder to state. (We would want something that at least gives you the shortest path when the graph you're looking for happens to be a path.)

I don't completely understand the treatment of --through in this writeup. Maybe some pseudocode would help. Let's start with how I'm interpreting the algorithm for a --through-free version. It's something like this (not including the "breadcrumbs" to find the actual path):

def search(outputs, inputs):  # inputs and outputs are both sets of states
  active = outputs
  while active:
    cur = active.pop_arbitrary_element()  # the order we pick makes it DFS or BFS or whatever
    if cur is not seen:
      mark cur as seen
      for op in cur.ops_that_can_produce_this():
        if all(s is seen for s in op.outputs):
          active += op.inputs

I dunno, something like that? It seems like it's on the right track, although the "breadcrumbs" aspect would be hard to add.

To add --through, I guess what I was intuitively imagining (and maybe this is what you're saying) is that we essentially make the --through ops work as the search "destinations." That is, the inputs argument to search in the pseudocode might be states (for actual inputs) or ops (for --through destinations). Then we call search in a loop, iterating through all the --through ops for all the inputs. Something like this?

jku20 Jun 19, 2024
Collaborator Author

I think the search you are describing is part of the algorithm, but not the whole thing. That search is used to find "segments" of the final plan (like what was done before handling multiple inputs and outputs).

There isn't really a version of this algorithm without --through arguments so I'm going to describe just the version using --through. Each value in through is paired with a value in outputs.

In the below, both states and ops will have inputs and outputs, representing nodes with edges into the given node and node with edges from the given node.

def search(outputs, through, inputs):
  for cur_out, cur_through in zip(output, through):
    path_to_through = find_plan(cur_through.inputs, inputs)    # find plan which runs "through" node as the last thing the plan does
    # find path from the "through" node to the output 
    active = cur_through
    while active:
      v = active.pop_arbitrary_element()
      if v is not seen:
        mark v as seen
        active += v.outputs
        if v is an op:
          segment[v] = find_plan(v.inputs, inputs)    # store a plan which runs v as the last thing it does
        if v is cur_out:
          path = backtrack to find path from cur_through to cur_out
          plan = path_to_through
          for op in path:
            plan = merge(plan, segment[op]
          # plan is now the plan for finding the given output
          # merge this plan with the plan for constructing all other outputs to get an answer
          break

merge take two plans and combines them into one which generates the same states but doesn't do duplicate work (e.g. if both plans call a single op, the merged plan would only call that op once).

The slightly weird thing here is the algorithm finds paths to ops instead of states when it gets a path to the value specified by --through.

To make this work when a op is not specified by --through, it can assume one of the ops going to an outputs state was specified by --through. If it works, great, if not, try a different op until one does (or none do).

If there are more ops specified by --through than states then this algorithm doesn't work (A thing similar to paths where first a "segment" to the first value specified in --through and then one to the next, and so on, can be done. I haven't implemented this though because I just remembered this case).

sampsyo Jun 19, 2024
Maintainer

Thanks! I think I’m getting closer to understanding here. My question now is:

What is the difference between this search and the helper find_plan? It seems like the latter would need to do essentially the same thing as this search written here. So the thing I’m not quite understanding is why we seem to need the DFS-like logic in two places? I thought we would have been able to get away with one DFS-like search, and to wrap that in an outer loop.

And one much less important thing, but just to throw it out there:

Would it be simpler/more elegant in the end to associate --through ops with inputs instead of outputs? I don’t know if that is good UX, but it could simplify the algorithm because inputs and --through ops would be sort of interchangeable. As in, the search would start from outputs and proceed by searching from there in an attempt to reach a mixture of input states and --through ops, handled in a uniform way. Once that’s done, we replace a given now-satisfied --through op with the associated input state (or the next step in the --through sequence, if multiple are allowed) and search again.

In the below, both states and ops will have inputs and outputs, representing nodes with edges into the given node and node with edges from the given node.In the below, both states and ops will have inputs and outputs, representing nodes with edges into the given node and node with edges from the given node.

I suppose these are usually called “predecessors” and “successors” in graphs.

jku20 Jun 19, 2024
Collaborator Author

find_plan works like search but with no --through argument. The logic in find_plan finds a tree, a plan which runs an op. The logic in search finds a path (not a tree!) from an op node passed with --through to an output.

I guess opposed to what I was saying before, find_plan is kind of the --throughless version of the algorithm and search is what implements --through by using find_path.

On the second question, I don't see how that guarantees every item passed in using --through gets used. Currently starting from outputs doesn't yield any particular inputs and it's hard to get it to because the algorithm doesn't know what subset of inputs or even what ops can generate a given output. Assigning inputs to different outputs before a search or even seen what the possible sets of inputs or ops can exist in a plan generating an output I haven't been able to see a super nice way to do.

sampsyo · 2024-06-19T15:06:09Z

sampsyo
Jun 19, 2024
Maintainer

Continuing a discussion from Slack, because this long rambly thinking didn't fit there…

Here is roughly what I am thinking about in the "better DSL for fud2 rules" department. If ops are functions, they should look a lot more like functions. Very sketchily, it would be nice if, instead of this (in current Rhai):

op(
  "calyx-to-verilog",
  [calyx_setup],
  calyx_state,
  verilog_state,
  |e, input, output| {
    e.build_cmd([output], "calyx", [input], []) ;
    e.arg("backend", "verilog");
  }
);

We wrote something more like:

def-op calyx-to-verilog(in: Calyx) -> (out: Verilog):
  shell("calyx -b verilog $in > $out")

There are a few things going on even in this very simple example:

Shallowly, just nicer syntax that looks like functions, with arguments/return types that look like states.
"Variables" that get interpolated into command strings. Like, we would check (at compile time) that $in and $out in the shell string are actually defined; mistyping $otu would be an error.
Elimination of the tedious management of explicit setups. Note that we are giving the command to execute right here, and we don't have to create a named setup and then explicitly depend on it.

Taking that last bullet further, we definitely want a way to abstract out setup stuff so they can be used in multiple ops. So let's tweak that imaginary example one step farther:

def calyx(in: Calyx, backend: str) -> (out: Any):
  shell("calyx -b $backend $in > $out")

def-op calyx-to-verilog(in: Calyx) -> (out: Verilog):
  calyx(in, "verilog")

def-op calyx-to-firrtl(in: Calyx) -> (out: Firrtl):
  calyx(in, "firrtl")

That of course exposes some type weirdness, but that is not the point I was trying to make here. The point is that the non-op function calyx amounts to defining a setup that will be shared between the two ops. But we don't have to explicitly declare the setup, or carefully remember to use the right setup that contains the Ninja rule that we want to use. We just invoke the function, and the compiler tracks for us which setups that entails. This would of course need to work recursively, as helper functions invoke helper functions.

A related frustration in the current "DSL" is the implicit dependencies on "resources" (see the rsrc calls). A good example is the use of json-data in the RTL simulation rules, which looks like this:

e.build_cmd(
  [output],
  "json-data",
  ["$datadir", "sim.log"],
  ["json-dat.py"],
);

Note the implicit dependency on json-dat.py. This is a resource file, and it comes from a call like this in the associated setup:

e.rsrc("json-dat.py");

The problem is that every single build command that uses json-data needs to remember to include the implicit dependency on json-dat.py. This is a fundamental abstraction violation; the json-data rule is part of the implementation of the build step and shouldn't be exposed in its interface. It would be much better if we could define a helper function like this:

def json-dat(datadir: str) -> (out: Json):
  json-dat-py = rsrc("json-dat.py")
  shell("python $json-dat-py --from-json $datadir > $out")

def-op simulate(simulator: Sim, data: Json) -> (out: Json):
  data-dir = json-dat(data)
  exec-sim(simulator, data-dir)

And merely calling json-dat(data) within our op will somehow automatically introduce the right dependencies because of the call within the implementation.

Maybe this gives some grist for a deeper discussion about what an "actually good DSL" might look like?

2 replies

sgpthomas Jun 19, 2024
Maintainer

An abstract thought while I'm thinking about this: right now writing fud2 stages requires knowledge about how Ninja works. For example, sometimes you need to pass "implicit deps" to build_cmd and knowing what this means requires understanding the order that Ninja builds things. Ideally you shouldn't have to think about this while writing fud2 stages.

sampsyo Jun 19, 2024
Maintainer

Right, this is a great way of framing it. Right now, the "DSL" is a very thin veneer over Ninja ASTs. And Ninja is—by design—super bare bones, so everything is explicit and nothing is easy. What we want instead is a different programming model with its own, much more convenient semantics that "compiles down" to Ninja.

jku20 · 2024-06-21T21:16:06Z

jku20
Jun 21, 2024
Collaborator Author

Update on Goals

So the "Little Hack Solution" is bad enough I don't think it is super worthwhile going through with. Implementing was educational (and might take things from it later) but I think I'll let #2134 get stale for a bit (and possibly close it).

Current goal is to read some more about program synthesis and then specify in isolation and solve this "path finding question" and possibly understand it's relation to build systems (The more I'm thinking about it, I feel like this build system connection actually isn't that useful the think about, though maybe interesting. Like thinking about the "categorization of paths that go through a certain op" is kind of analogous to having to find paths which go through a "dirty" file. idk).

The current approach @sampsyo wrote a little about somewhere but repeating it here very concretely means

implementing some really slow "enumerate all programs" solution (our graphs are really small!)
probably less thinking about graphs (maybe come back to it)
try to figure out how to specify ops as rewrites and take a look at egraphs to figure out all the things which can be made with these rewrites (seems promising)
figure out the structure of the program and use an smt solver to fill in the details (promising also)
optimization (idk, I'll think about it later)

The DSL is probably going to go on hold right now.

Currently progress is the hacky thing seems to work in it's super constrained way, but it looks backwards compatible, and also some reading about egraphs (egg and egglog) and program synthesis using petri nets and smt solvers. Both seem promising.

0 replies

jku20 · 2024-07-08T16:28:50Z

jku20
Jul 8, 2024
Collaborator Author

On Finding Plans (sequence of ops taking inputs to outputs)

#2134 merged! This will followup with a simple "enumeration of all plans" solution and ways to specify ops with multiple inputs and outputs.

Past that, I spent a bit more time reading. The "APIs" fud2 deals with are much smaller than for a lot of studied things. That isn't where most of my effort has been.

Plan

Get the followup prs up.
Choose a program synthesis paper to just implement and think a bit less.
Figure out a quantitative metric to compare different plan finding algorithms.

The last bullet is kind of vague. That's intentional, still not totally clear how to quantify the "goodness" of an algorithm (for example, performance doesn't work because once a pretty low threshold is hit, it doesn't matter to much).

The DSL

There are three things in ninja to think about: variables, rules, and builds. One way to think about these are

variables are settings for the build
rules are the "how," fancy shell commands.
builds are assertions that certain shell commands take one file to another
The goal is to encode these three things in a language which is nice to write. Here is an example showcasing what would exist:

fn do_smth(in, out, another_flag) {
    flag := "a" if config("hello") else "b"
    shell("./bin {in} --arg {flag} {another_flag} -o {out}")
}

defop calyx_to_verilog(c: calyx) -> v: verilog {
    foo := "foo"
    shell("do one thing")
    do_smth(c, v, foo)
}

Here do_smth is like rule, flag is like a variable, and defop is like a build command. This mapping isn't quite 1 to 1 (and it shouldn't be, then why not just use ninja!). For example, calyx_to_verilog has more than do_smth in it, allowing shell commands to be defined with the op. There is also some scripting capability shown in do_smth.

I think using Rhai still would be nice because some of @sgpthomas's work can be reused and then the scripting comes for free.

The current plan

Rewrite the current Rhai integration to make the new syntax possible.
Implement the new syntax into Rhai.

The above two things should probably be separate prs.

1 reply

sampsyo Jul 11, 2024
Maintainer

Figure out a quantitative metric to compare different plan finding algorithms.

The last bullet is kind of vague. That's intentional, still not totally clear how to quantify the "goodness" of an algorithm (for example, performance doesn't work because once a pretty low threshold is hit, it doesn't matter to much).

Yeah, this would be good to think about some more. I think we should not shy away from doing some synthetic experiments with generated op-graphs that are unrealistically large, just to understand the scaling behavior of the search algorithms. This would just help us understand where, in a hypothetical future where a fud-core-based tool gets bigger, it might start running into problems.

Rewrite the current Rhai integration to make the new syntax possible.

Implement the new syntax into Rhai.

The above two things should probably be separate prs.

Strong agree; this would help keep things intelligible.

jku20 · 2024-07-11T17:55:57Z

jku20
Jul 11, 2024
Collaborator Author

Update

A functional draft of the dsl has been implemented! See #2203. Not much else, but yeah.
Current things to do:

Make sure error handling for invalid Rhai scripts works well (I don't know if this ever worked well though so probably a different PR).
Rewrite old ops in the new style. This is another "new PR" thing, but I might want to do one right now to see how it is.

1 reply

sampsyo Jul 11, 2024
Maintainer

Sounds great. I will take a look at #2203 posthaste.

It seems nice if we can do some kind of incremental migration to the new syntax without doing everything all at once… but I dunno, that's not too important if it proves easier to do everything all at once.

jku20 · 2024-07-18T19:28:11Z

jku20
Jul 18, 2024
Collaborator Author

Update

More work on #2203 and a little more program synthesis reading. Progress exceedingly slow for no particular reason.

1 reply

jku20 Jul 19, 2024
Collaborator Author

Adding to this, currently started on planner using egg.

jku20 · 2024-07-30T15:12:09Z

jku20
Jul 30, 2024
Collaborator Author

Using Egg to Optimize an Enumerate Search for Programs

This is copy pasted from zulip but I think is worth putting here as an update.

Problem Statement
You are given a set of "ops" and a set of "states." Ops are functions taking in (but not consuming) a set of states and returning sets of states.

You are tasked with finding a sequence of ops which takes some initial set of states to a new set of states with the constaint that there is a set of ops which must be used in this sequence. By being "used" in this sequence, I mean the op must be present in the sequence AND it's outputs, if they were removed, would make construction of the final set of outputs impossible.

If states are thought about as types, then this problem redueces to creating a function out of ops, an API of sorts, which satisfies a given type signature.

This problem is researched, but generally in the context of large APIs (1000s of functions and types). fud2 is different in that doesn't have this many ops and states (I think a couple hundred is probably a reasonable, future-proof bound) but cares more about the structure of the result: it must have a set of required ops (and bonus points if there are other ways to tweek the output).

It is certainly possible to implement these solutions and they would probably work well with some small changes to be more particular about the structure of the resulting programs. They are complicated to implement though and often solve harder problems than needed.

Enumerative Solution
Because of the current small amount of ops and states, a simple enumerative solution, enumerate over all programs in order of length, is currently possible (though I wouldn't call that future proof).

Encoding using Egg
Egg can be used to optimize this enumerative solution. Assign each op and state a number. A set of states which was created by a set of ops can be represented a pair of lists lists of "x"s and "c"s. The ith element of the first list is "c" iff state i exists in the set of states, else it is "x". Similarly, the ith element of the second list is "c" iff op i was used to create the set of states. For example, if there are 3 states and 2 ops, the sets of states {1, 3} created using op 2, could be represented by the string (c x c) (x c).

Ops can then be described as rewrites to this string. For example, if op 1 takes state 1 to state 2, it would do the following to the above string: (c x c) (x c) => (c c c) (c c). State 1 is still kept around as ops do not consume states. Egg can be used to encode these rewrites on these string. The desired set of states and ops can then be encoded to a string and searched for. Egg support functionality for retrieving the set of rewrites used to turn an initial string into a final string. This can be directly translated into sequence of ops.

Optimizing Egg Solution
This direct encoding, however, creates too large e-graphs. To solve this, sacrifice a little bit of completeness to get an working solution using egg. Add the constraint "every state will only be used as input to a single op." Anothe way to say this is "make ops consume input states." Using the example from above, op 1 would take (c x c) (x c) => (x c c) (c c). This creates a managably sized e-graph (I'm currently working on investigating for how effective this is on larger graph or graphs of different states along with further testing).

Further Updates

Testing is in progress along with documentation (a short tutorial creating example op exists, but the actual additions need to be documented more precisely).

2 replies

sampsyo Jul 31, 2024
Maintainer

This is really great; thanks for writing this up! This really does clarify some things especially about how the egg-based planner works. As I mentioned briefly in #2222 (review), what would you think about putting this description of the egg-based strategy into a doc comment, to associate it with the code for posterity?

jku20 Jul 31, 2024
Collaborator Author

make sense

jku20 · 2024-08-09T05:46:57Z

jku20
Aug 9, 2024
Collaborator Author

Summer 2024 Wrapup

The main contributions are a working syntax (and backend for that syntax) for writing fud2 ops. This should be functional modulo the below and probably some bugs/bad error messages I discover as I further use it. Progress in the past week has been mostly small improvements, getting larger PRs merged, and the below as well as other related, but not always super productive endeavors (e.g. poster making).

I think the big three important but unfinished parts of this project are:

work towards a better planner (and possibly benchmarks on the current planners on different types of graphs, though with only two planners, I'm not sure how educational this really is)
migration of old ops to the new syntax
generate nicer ninja files

Plans: I plan to punt 1 into the future. The current methods for finding plans aren't ideal, but are functional.

I think the goal will be to tackle 2 and then 3, probably having 2 inform 3 a bit (and possibly working on 3 while doing 2 as needed). Using shell_deps commands should mean the ninja generated by the new scripts won't be super different from the current ninja.

As the generated ninja doesn't depend on the config file or input/output file names, verifying the new ninja on a single test is identical to the old ninja is sufficient for noting the agreement of the migration. This is easy to check by running the generated ninja with ninja -t commands to print what the executed commands would be and diffing the results of the old and new ops.

As a timeline for when this migration will get done, I don't really have one. The hope was it would be done by now, but that is not the case.

I'll try best effort to get 2 and 3 done and more generally support fud2, when I have leftover time, but can't give a specific timeline, sorry; leftover time is not always easy to predict.

1 reply

sampsyo Aug 11, 2024
Maintainer

Sounds good! And completely understandable that you will need to focus on other things. I hope that someday we can delete much of the old Rust op implementations so we can end the era of duplicates.

The Calyx Infrastructure

fud2 Improvements Summer 2024 Lab Notebook #2113

jku20 Jun 6, 2024 Collaborator

Replies: 11 comments · 17 replies

jku20 Jun 6, 2024 Collaborator Author

This presents a couple challenges:

Thoughts and Proposed Solutions:

Current Questions I'm Working On:

Current Progress:

Current Next Steps:

jku20 Jun 7, 2024 Collaborator Author

sampsyo Jun 7, 2024 Maintainer

jku20 Jun 10, 2024 Collaborator Author

Constraints on Paths through the State/Op Graph

sampsyo Jun 12, 2024 Maintainer

jku20 Jun 12, 2024 Collaborator Author

sampsyo Jun 12, 2024 Maintainer

sampsyo Jun 12, 2024 Maintainer

jku20 Jun 14, 2024 Collaborator Author

Little Hack Solution For Finding Plans

Limitations:

Implementation

sampsyo Jun 19, 2024 Maintainer

jku20 Jun 19, 2024 Collaborator Author

sampsyo Jun 19, 2024 Maintainer

jku20 Jun 19, 2024 Collaborator Author

sampsyo Jun 19, 2024 Maintainer

sgpthomas Jun 19, 2024 Maintainer

sampsyo Jun 19, 2024 Maintainer

jku20 Jun 21, 2024 Collaborator Author

Update on Goals

jku20 Jul 8, 2024 Collaborator Author

On Finding Plans (sequence of ops taking inputs to outputs)

Plan

The DSL

The current plan

sampsyo Jul 11, 2024 Maintainer

jku20 Jul 11, 2024 Collaborator Author

Update

sampsyo Jul 11, 2024 Maintainer

jku20 Jul 18, 2024 Collaborator Author

Update

jku20 Jul 19, 2024 Collaborator Author

jku20 Jul 30, 2024 Collaborator Author

Using Egg to Optimize an Enumerate Search for Programs

Further Updates

sampsyo Jul 31, 2024 Maintainer

jku20 Jul 31, 2024 Collaborator Author

jku20 Aug 9, 2024 Collaborator Author

Summer 2024 Wrapup

sampsyo Aug 11, 2024 Maintainer

jku20
Jun 6, 2024
Collaborator

Replies: 11 comments 17 replies

jku20
Jun 6, 2024
Collaborator Author

jku20
Jun 7, 2024
Collaborator Author

sampsyo Jun 7, 2024
Maintainer

jku20
Jun 10, 2024
Collaborator Author

sampsyo Jun 12, 2024
Maintainer

jku20 Jun 12, 2024
Collaborator Author

sampsyo Jun 12, 2024
Maintainer

sampsyo Jun 12, 2024
Maintainer

jku20
Jun 14, 2024
Collaborator Author

sampsyo Jun 19, 2024
Maintainer

jku20 Jun 19, 2024
Collaborator Author

sampsyo Jun 19, 2024
Maintainer

jku20 Jun 19, 2024
Collaborator Author

sampsyo
Jun 19, 2024
Maintainer

sgpthomas Jun 19, 2024
Maintainer

sampsyo Jun 19, 2024
Maintainer

jku20
Jun 21, 2024
Collaborator Author

jku20
Jul 8, 2024
Collaborator Author

sampsyo Jul 11, 2024
Maintainer

jku20
Jul 11, 2024
Collaborator Author

sampsyo Jul 11, 2024
Maintainer

jku20
Jul 18, 2024
Collaborator Author

jku20 Jul 19, 2024
Collaborator Author

jku20
Jul 30, 2024
Collaborator Author

sampsyo Jul 31, 2024
Maintainer

jku20 Jul 31, 2024
Collaborator Author

jku20
Aug 9, 2024
Collaborator Author

sampsyo Aug 11, 2024
Maintainer