-
Notifications
You must be signed in to change notification settings - Fork 503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
files as dependency #867
Comments
I definitely see why this would be useful, and it's a useful feature of Make. It's tricky though! Let's use a simplified example. In this justfile, I'm using a string dependency to mean "this dependency is actually a path to an input file". Here's a simple recipe that just copies an input file to an output file:
However, this isn't yet enough information.
Now just knows the input file and the output file, and can only run the recipe when There are some gotchas though. If you were using a compiler, and updated the compiler to a new version,
Similarly, if you were using compiler flags that changed, just consider those flags to be part of the "input". For example, if you changed
My hunch is that this is just too hard to get right enough for it to be worth adding, and that there would be a confusing long-tail of issues. It's not a satisfying answer, but you could always create a In order for me to be convinced that this was a good idea, I think someone would have to present a design which was relatively simple, with straightforward limitations that would make sense to users, and didn't look like it would lead to a long-tail of weird issues. I'll leave this open in the hope that someone comes up with such a design! |
You could also embed a ninja file inside a justfile, which is actually kind of nice:
|
(Perhaps not super useful though, since you're using a shebang recipe, and I don't think it's possible to embed non-shell commands in ninja.) |
This is the reason I'm not really considering Input>output date based caching isn't perfect, but still it's portable, it's easy to understand, and it doesn't rely on some stateful process that won't survive a reboot etc. It's not as if just is perfect. You mentioned the possibility of an compiler update. Well, considering you call commands just with a name, a simple alias (perhaps not shell alias but still) with different behavior can also break things perfectly fine. Provided your commands don't crap out with unexpected input, at worst a input>output caching will fail with unexpected error output and you'll have to go back to the only behavior currently officially supported, just (is that where the name is from?) running everything. I agree that the default behavior and often the syntax of make often are problematic. But still, that basic behavior is, while not perfect, too valuable for me too give up. Not every command reuses old results like eg cargo, and in that case just with properly declared dependencies becomes unusable for many workflows, and the proposed solutions, just using another build tool, or keeping a watch process running, are not sufficient IMO and have more issues than when input>output would be supported. So, after that long rant, I'm far from a finished solution, but I believe that idea might be beneficial (though probably would require a lot of work, even if hopefully only additive in behavior). One could add a way to enable caching given some "was changed" validation returns false. Combined with a helper - provided with appropriate warnings that it's not perfect - that implements the input date>output date validation, and perhaps the easy ability to define other helpers (eg one that checks for changes of used compilers) and combine them, would solve that issue for me and might be acceptable to you. Where I am seeing most issues is in terms of syntax and in how to handle these validators without introducing a complex system. Perhaps one could define them similar to normal tasks, but still they'd behave differently, as they have to somehow signify true/false. |
I'm open to this being implemented, but it's a lot of work, and there are a lot of tricky open questions. Make is a kitchen sink of weird features needed to support edge cases, and many aspects of its design are widely regard as very bad. The best way forward is for someone to figure out how to make I suspect that |
I believe Redo, Ninja and others keep a hash for source files and when processing is called a check against the cached hash and the current source files hash is checked to know if it's been modified since the last processing run. I created a standalone tool for allowing shell scripts and such to be able to have the same functionality with https://github.com/runeimp/filedelta . This sort of functionality obviates the need to check if the source is newer than the target and so only the source needs to be checked, which is handy for dependency checking. And still works if the target is accidentally "touched" after the last modification to the source. The basic process is pretty simple and could be extended to take "context" hash or string as well in cases where the source file may not have changed but needs to be run in a new context. |
Anytime I use a Justfile for a project, I end up limited about this issue. The only concrete syntax proposed here was: foo: "input.csv" > "output.csv"
cat input.csv > output.csv which I'm personally not a fan of. For a few reasons:
Attributes syntaxLooking at the existing syntax already supported by Just, I think a good place to think about this would be Recipe Attributes. A quick (and probably bad) example: [if(updated(input.csv))]
foo:
cat input.csv > output.csv I disagree with @casey that the output file is needed info for this to work. We can choose to compute a hash for the input, store it somewhere, and run only once it changes. Custom user commandsAs an alternative, or maybe even an addition, we can also opens up the possibility of people specifying their own arbitrary conditions: needs_build := `find -name input.csv -newer output.csv`
[if(needs_compilation)]
build: |
Just can consider incorporating an approach from makesure. Makesure has Thus you can easily use bash's @goal regenerate_file2_based_on_file1
@reached_if [[ file2 -nt file1 ]]
do_something file1 > file2 Here is an example from real project: @goal inventory_generated
@depends_on output_ready__instance_public_ip
@reached_if [[ -f "$INVENTORY" ]] && [[ "$INVENTORY" -nt 'inventory.tpl.yml' ]]
@doc 'inventory-<ENV>.yml generated from terraform data'
instance_public_ip="$(cat "$INSTANCE_PUBLIC_IP")"
awk -v instance_public_ip="$instance_public_ip" '
{
gsub(/\$instance_public_ip/, instance_public_ip)
gsub(/\$ENV/, ENVIRON["ENV"])
print
}
' inventory.tpl.yml > "$INVENTORY" Obvious cons of this approach: |
I like this idea. What’s the benefit of storing a hash vs the last modified time? One could also store checksums for the justfile task, to check if you’ve changed it since last run. Storing modified times for binaries used should also be possible. |
The main benefit for hashing is simply that it doesn't require access to the target file to determine if the source changed. This can be important on certain file systems where modification times are beyond your ability to guarantee are stable. This is rarely the case but when it is hashing the source can guarantee a needed build (or whatever) happens. This can also be important when building the source is incredibly costly in some way and definitely only want to rebuild (or whatever) when the source has definitely changed as apposed to someone opening the source to review, make NO changes, and accidentally saving it out of habit, thus changing it's modification time to newer than the target. |
I think it could be nice to use a functionlike syntax to indicate source files. This way it is unambiguous - you don't have questions about whether your shell variable gets treated as a string, and you can mix and match file targets with other targets: othertarget:
echo done
foo: files(a.txt, b.c, d.csv, $foo_file) othertarget > files(x.o)
cc ... Or, just create something that is like the opposite of [file]
%.c:
[file]
%.o: %.c
gcc ...
# Maybe regex could be used? Much more flexible and well known than Make's `%`
[file]
.*\.o: $1\.c:
gcc ... Doing these sort of things with Make is always horrid because of how easy it is to mistype a pattern and get "no rules to make target" somewhere in the graph. Hopefully Just could support something better. |
I think I discovered the simplest solution with the most power: Add a recipe attribute It's simple to understand with no extra syntax introduced. Why is this so powerful? Because it gives access to EVERY function you can use in just
How do we include these "change detectors" in the recipe content without changing what runs? Put it in a comment.
(We can also add something in the rare case a The only weakness of this strategy is when recipe arguments change what is done, not how it is done. For example:
In the example, building The only way to solve it I think is to allow the
will evaluate I'm gonna start implementing this if there are no major objections and after I finish some other open-source work I've been doing. |
This is pretty interesting! It seems like it could be an interesting and low-impact way to support this. |
@tgross35 You're absolutely right 😅; they're practically the same. I had apparently only skimmed your solution not fully understanding it (sub-conscience plagiarism?) I think the main difference between yours and mine is yours would only rerun if the directory name changes instead of when the directory contents change ( |
Great minds think alike 🙂
Indeed, that was just an example of how I am implementing pseudo-cached recipes (cmake handles the actual changed files in that example). I think that if there is a basic implementation of cached recipes, then it should be easy enough to improve the ergonomics later. Maybe:
|
Instead of a whole new function, I was thinking of expanding Something like In any case, I'll start working on recipe caching based on evaluated content since that isn't affected by these decisions. Sidenote @tgross35 : We have used the term "cache key" in different ways (don't know/care which is "correct").
Just wanted to mention it before it caused confusion. |
I was loosely thinking that a specific
By cache key I just meant whatever gets hashed to represent the state - variables/expressions, other computed hashes (file contents), or string literals. I'm not exactly sure what the All in all, I imagined something like this being stored in the cache directory, if {
"cached_recipes": [
{
"path": "/home/user/project/justfile",
"recipe": "configure",
// this is `blake3sum(blake3_files("**/*.c") + hash_env("CARGO.*") + some_other_var + ...)
"key_hash": "f7b2f545fb75d120c0dac039aff99ff472c21b170bf3d0714e7b9a34113e7f04",
"last_run": "2024-01-21T08:40:52Z"
}
// ...
]
} |
I see. I recently thought of
Actually, it's kinda the opposite :) Here is the basic json structure I went with for the cache file: {
"working_directory": "/typically/the/project/dir",
"recipe_caches": {
"recipe_name": {
"hash": "2a680e74556e82b6f206e3e39c6abe3e28a6530c8494ea46e642a41b9ef7424a"
}
}
} So the hash is still the hash of the whole body, but when arguments are added to {
"build:BIN=client": {
"hash": "..."
},
"build:BIN=server": {
"hash": "..."
}
} Note: every |
I created https://crates.io/crates/mkrs primarily for this reason (and to write the Makefile in markdown); it seems like the same approach may not fit in just's design. Mkrs distinguishes between file and non-file targets, and runs the recipe for a file target if |
Might I contribute a temporary and compact solution that doesn't require any dependencies except what you'd usually find on linux machine (md5sum, head and grep) (works only on linux and possibly under mingw): # This is task that you'd call
build: (file "main.c") (file "other.c")
echo "Finished compiling"
# Use this task to add hashed file dependencies
# And do actual compilation
[private]
file filepath: (track filepath) && (hash filepath)
echo "Compiling {{filepath}}"
# Don't forget to add '.hashes' to gitignore
[private]
[no-exit-message]
track file:
#!/usr/bin/env bash
[ ! -f .hashes ] && touch .hashes
[[ "$(md5sum {{file}} | head -c 32)" == "$(grep " {{file}}$" .hashes | head -c 32)" ]] && exit 1 || exit 0
[private]
hash file: (track file)
#!/usr/bin/env bash
echo "$(grep -v " {{file}}$" .hashes)" > .hashes && md5sum {{file}} >> .hashes |
Hi,
I'm glad to discover
just
sincemake
is painful to write. However, I still like some dependencies to be actual files. The recipe should run if file exists or has been updated. "File exists" is easy to find a workaround and there are some workarounds for "if file changed" cases such aswatchexec
or tricks mentioned in #424. But I was still wondering if it's possible to use a file as dependency so that dependees run if file has changed or file is missing?Just a hypothetical example:
Is it possible that
just process-data
does not run ifinput.csv
has not changed?The text was updated successfully, but these errors were encountered: