From 1391dacf1b72ec4be57b8ed3cb88cf75c02c24f5 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Tue, 12 Oct 2021 17:00:38 -0400 Subject: [PATCH 01/20] nixpkgs-ifd: Copy template --- rfcs/0000-nixpkgs-ifd.md | 58 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 58 insertions(+) create mode 100644 rfcs/0000-nixpkgs-ifd.md diff --git a/rfcs/0000-nixpkgs-ifd.md b/rfcs/0000-nixpkgs-ifd.md new file mode 100644 index 000000000..43569cf57 --- /dev/null +++ b/rfcs/0000-nixpkgs-ifd.md @@ -0,0 +1,58 @@ +--- +feature: (fill me in with a unique ident, my_awesome_feature) +start-date: (fill me in with today's date, YYYY-MM-DD) +author: (name of the main author) +co-authors: (find a buddy later to help out with the RFC) +shepherd-team: (names, to be nominated and accepted by RFC steering committee) +shepherd-leader: (name to be appointed by RFC steering committee) +related-issues: (will contain links to implementation PRs) +--- + +# Summary +[summary]: #summary + +One paragraph explanation of the feature. + +# Motivation +[motivation]: #motivation + +Why are we doing this? What use cases does it support? What is the expected +outcome? + +# Detailed design +[design]: #detailed-design + +This is the core, normative part of the RFC. Explain the design in enough +detail for somebody familiar with the ecosystem to understand, and implement. +This should get into specifics and corner-cases. Yet, this section should also +be terse, avoiding redundancy even at the cost of clarity. + +# Examples and Interactions +[examples-and-interactions]: #examples-and-interactions + +This section illustrates the detailed design. This section should clarify all +confusion the reader has from the previous sections. It is especially important +to counterbalance the desired terseness of the detailed design; if you feel +your detailed design is rudely short, consider making this section longer +instead. + +# Drawbacks +[drawbacks]: #drawbacks + +Why should we *not* do this? + +# Alternatives +[alternatives]: #alternatives + +What other designs have been considered? What is the impact of not doing this? + +# Unresolved questions +[unresolved]: #unresolved-questions + +What parts of the design are still TBD or unknowns? + +# Future work +[future]: #future-work + +What future work, if any, would be implied or impacted by this feature +without being directly part of the work? From 71c4f28627a340489667067c428f2d077c3dca23 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Tue, 12 Oct 2021 18:00:05 -0400 Subject: [PATCH 02/20] nixpkgs-ifd: First draft --- rfcs/0000-nixpkgs-ifd.md | 116 ++++++++++++++++++++++++++++++++------- 1 file changed, 96 insertions(+), 20 deletions(-) diff --git a/rfcs/0000-nixpkgs-ifd.md b/rfcs/0000-nixpkgs-ifd.md index 43569cf57..65fdc9e17 100644 --- a/rfcs/0000-nixpkgs-ifd.md +++ b/rfcs/0000-nixpkgs-ifd.md @@ -1,7 +1,7 @@ --- -feature: (fill me in with a unique ident, my_awesome_feature) -start-date: (fill me in with today's date, YYYY-MM-DD) -author: (name of the main author) +feature: nixpkgs-ifd +start-date: 2021-10-12 +author: John Ericson (@Ericson2314) co-authors: (find a buddy later to help out with the RFC) shepherd-team: (names, to be nominated and accepted by RFC steering committee) shepherd-leader: (name to be appointed by RFC steering committee) @@ -11,48 +11,124 @@ related-issues: (will contain links to implementation PRs) # Summary [summary]: #summary -One paragraph explanation of the feature. +Many of us want IFD to be allowed in Nixpkgs. +IFD would be terrible for `hydra.nixos.org`, though, +The solution is to cut the Gordian knot: Allow IFD in Nixpkgs while still (effectively) prohibiting it in CI. # Motivation [motivation]: #motivation -Why are we doing this? What use cases does it support? What is the expected -outcome? +Nixpkgs, along with every other distro, also faces a looming crisis: new open source software is increasingly not intended to be packaged by distros at all. +Many languages now support very large library ecosystems, with dependencies expressed in a language-specific package manager. +To this new generation of developers, the distro (or homebrew) is a crufty relic from an earlier age to bootstrap modernity, and then be forgotten about. + +Right now, to deal with these packages, we either convert by hand, or commit lots of generated code into Nixpkgs. +But I don't think either of those options is healthy or sustainable. +The problem with the first is sheer effort; we'll never be able to keep up. +The problem with the second is bloating Nixpkgs but more importantly reproducability: If someone wants to update that generated code it is unclear how. +All these mean that potential users coming from this new model of development find Nix / Nixpkgs cumbersome and unsuited to their needs. + +The solution *outside* of Nixpkgs is today is "import from derivation", i.e. building code in Nix to be consumed at eval time. +Many institutional users of Nix use this. +But while the practice is banned in Nixpkgs, those efforts are not very coordinated, and the `lang2nix` ecosystem has a hard time getting off the ground. + +I am *not* arguing that IFD is the best possible solution. +But it's the one we've got to day, and long term alternatives, like RFC #92, face *significant* hurdles in being ergonomic and integrating with current idioms in Nixpkgs -- e.g. the `meta` on every derivation from `mkDerivation`. +In the spirit of learning to walk before learning to run, and beginning to acknowledge addresses these problems, we are best-serviced by getting IFD in Nixpkgs as a first-gen solution as soon as possible. +The only barrier then is addressing eval resource usage costs. # Detailed design [design]: #detailed-design -This is the core, normative part of the RFC. Explain the design in enough -detail for somebody familiar with the ecosystem to understand, and implement. -This should get into specifics and corner-cases. Yet, this section should also -be terse, avoiding redundancy even at the cost of clarity. +## Nixpkgs + +1. Add a new `enableIFD` config parameter to Nixpkgs. + When it is `false`, anything using IFD must be disabled so that a regular evaluation like we do today succeeds. + +2. Add a new `allImportedDerivations` top-level attribute. + This *must* be buildable with `enableIFD = false`. + It *must* have in its runtime closure any derivation output that Nixpkgs with `enableIFD = true` imports. + +3. Any code vendored in Nixpkgs must correspond to code produced in a derivation, so the code can be mechanistically re-vendored. + +## Hydra policy + +Instead of kicking of single evaluations of Nixpkgs, we will kick of double evaluations: + + 1. Evaluate Nixpkgs normally. + + 2. Build `allImportedDerivations`, and copy its closure to the evaluation machine. + + 3. Evaluate Nixpkgs with `enableIFD = true`, with the closure added to the eval paths whitelist, and with IFD partially "allowed, but with `-j0`". + What this means is no building can happen at eval time, but we can import derivations that are already built. # Examples and Interactions [examples-and-interactions]: #examples-and-interactions -This section illustrates the detailed design. This section should clarify all -confusion the reader has from the previous sections. It is especially important -to counterbalance the desired terseness of the detailed design; if you feel -your detailed design is rudely short, consider making this section longer -instead. +TODO, demonstrate changes to Nixpkgs, e.g. using the Haskell infrastructure, in a fork, and link? # Drawbacks [drawbacks]: #drawbacks -Why should we *not* do this? +1. > We've *doubled* the amount of evaluation we do, oh no! + + Sounds scary, I know. + But I don't think that's bad, actually. + What's so bad today is the time and memory usage of *each* evaluation. + + You can think of that as a bunch of ungainly massive rectangular tiles we are trying to fit on a floor, the floor being our machine resources to schedule. + What would be really bad is *increasing the tile size*. + This means we need a bigger floor or else laying tile is harder. + What this is doing is *increasing the number of times*. + We can simple add a second floor to solve any problems that arise from that. + + This is simplification, sure, but I think the parable is correct to the "real" situation, too. + +2. > Isn't IFD really slow? + + What is slow is that evaluator has no parallelism. + That means is that every time we hit an *unbuilt* derivation, we block until it's finished building. + Worse, even if it is easy to run, we're probably going to check some substituters, etc., so there are all sorts of slow IO round trips making the critical path worse. + We could fix this, but there is no energy to do so right now. + Making the evaluator parallel without making our memory issues worse is hard work. + + But, none of that matters for this proposal. + `hydra.nixos.org` will only need to read built paths, and that shouldn't be meaningfully slower than regular `import`-ing. + +3. > IFD, is too controversial, don't do it! + + I think this is a classic example of don't let the perfect be the enemy of the good. + The problems with IFD and the problems IFD is trying trying to address both don't let a lot of attention. + The fact of the matter is Nixpkgs is how this community coordinates with itself, and agrees on priorities. + If it isn't being used in Nixpkgs, there is hard ceiling of how much attention it will get. + + The benefits of IFD don't get enough attention. + A package in Nixpkgs is more than a derivation: there's the `meta` as I mentioned above. + There's also being able to read the code and (somewhat) understand what's going in reference to other derivations. + Finally, there's being able to `override`, `overrideAttrs`, etc. the derivation downstream. + IFD alone allows computed packages that follow all these norms. # Alternatives [alternatives]: #alternatives -What other designs have been considered? What is the impact of not doing this? +The only alternative that isn't massively harder is doing nothing. # Unresolved questions [unresolved]: #unresolved-questions -What parts of the design are still TBD or unknowns? +Should we call it `IFD`, or should we give it a different name? +`builtins.readFile ` is really the same thing for our purposes, so I am sympathetic to renaming. # Future work [future]: #future-work -What future work, if any, would be implied or impacted by this feature -without being directly part of the work? +In a grand future we might do things completely differently. +But I have no idea how stuff is going to shake out. +In particular, if we don't do something like this, I don't think we will ever get to that future. +So even if this technically "barking up the wrong tree", I think it is a necessary first step to get things going. + +I will say, though, with these steps, I think we will be able to successfully convert to Nix a bunch of developers that mainly work in one language, and didn't even think they were in need of a better distro. +In turn, I hope these upstream packages and ecosystems might even care about packaging and integration of the sort that we do. +This would create a virtuous cycle where Nix is easier to use by more people, and Nixpkgs is easier to maintain as upstream packages better match our values. + +The most important future work is technical, but being able to win upstream developer hearts and minds better than before, because ultimately distribution's live and die by upstream's decisions. From 9b343dfc3951d2487b26b15327999a27859fbd84 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Tue, 12 Oct 2021 18:34:30 -0400 Subject: [PATCH 03/20] nixpkgs-ifd: Mention only one round of dynamism --- rfcs/0000-nixpkgs-ifd.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/rfcs/0000-nixpkgs-ifd.md b/rfcs/0000-nixpkgs-ifd.md index 65fdc9e17..faa5a0411 100644 --- a/rfcs/0000-nixpkgs-ifd.md +++ b/rfcs/0000-nixpkgs-ifd.md @@ -108,6 +108,17 @@ TODO, demonstrate changes to Nixpkgs, e.g. using the Haskell infrastructure, in Finally, there's being able to `override`, `overrideAttrs`, etc. the derivation downstream. IFD alone allows computed packages that follow all these norms. +4. Say I want IFD that depends on other IFD? + + In other words, can one import a derivation that is itself evaluated from and import from a derivation? + No, not without introducing another round of building and evaluating for Hydra. + But I don't think we need arbitrarily-deep dynamism anyways: it is a tool that should be used with care anyways, because stasis \[staticism?\] is the goodly disciplinarian that makes Nixpkgs so great. + + That said, `cabal2nix` is written in Haskell, `crate2nix` is written in Rust, etc. etc. + We can vendor enough code to build these tools and thus bootstrap the IFD we will do. + Per the 3rd rule for Nixpkgs above, as long as we make the vendoring automatic and pure, this is fine, and improvement upon today. + Also, even if we didn't have the "one round of dynamism" restriction, we would still have the bootstrapping issue. + # Alternatives [alternatives]: #alternatives From a43045bd4fce1d5f97802d74f0764c2ccc83f560 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Wed, 13 Oct 2021 13:35:23 -0400 Subject: [PATCH 04/20] nixpkgs-ifd: Define IFD --- rfcs/0000-nixpkgs-ifd.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/rfcs/0000-nixpkgs-ifd.md b/rfcs/0000-nixpkgs-ifd.md index faa5a0411..62d2b6bda 100644 --- a/rfcs/0000-nixpkgs-ifd.md +++ b/rfcs/0000-nixpkgs-ifd.md @@ -11,7 +11,7 @@ related-issues: (will contain links to implementation PRs) # Summary [summary]: #summary -Many of us want IFD to be allowed in Nixpkgs. +Many of us want "import from derivation" \[hereafter, IFD\] be allowed in Nixpkgs. IFD would be terrible for `hydra.nixos.org`, though, The solution is to cut the Gordian knot: Allow IFD in Nixpkgs while still (effectively) prohibiting it in CI. @@ -42,12 +42,12 @@ The only barrier then is addressing eval resource usage costs. ## Nixpkgs -1. Add a new `enableIFD` config parameter to Nixpkgs. +1. Add a new `enableImportFromDerivation` config parameter to Nixpkgs. When it is `false`, anything using IFD must be disabled so that a regular evaluation like we do today succeeds. 2. Add a new `allImportedDerivations` top-level attribute. - This *must* be buildable with `enableIFD = false`. - It *must* have in its runtime closure any derivation output that Nixpkgs with `enableIFD = true` imports. + This *must* be buildable with `enableImportFromDerivation = false`. + It *must* have in its runtime closure any derivation output that Nixpkgs with `enableImportFromDerivation = true` imports. 3. Any code vendored in Nixpkgs must correspond to code produced in a derivation, so the code can be mechanistically re-vendored. @@ -59,7 +59,7 @@ Instead of kicking of single evaluations of Nixpkgs, we will kick of double eval 2. Build `allImportedDerivations`, and copy its closure to the evaluation machine. - 3. Evaluate Nixpkgs with `enableIFD = true`, with the closure added to the eval paths whitelist, and with IFD partially "allowed, but with `-j0`". + 3. Evaluate Nixpkgs with `enableImportFromDerivation = true`, with the closure added to the eval paths whitelist, and with IFD partially "allowed, but with `-j0`". What this means is no building can happen at eval time, but we can import derivations that are already built. # Examples and Interactions @@ -127,7 +127,7 @@ The only alternative that isn't massively harder is doing nothing. # Unresolved questions [unresolved]: #unresolved-questions -Should we call it `IFD`, or should we give it a different name? +Should we call it "import from Derivation", or should we give it a different name? `builtins.readFile ` is really the same thing for our purposes, so I am sympathetic to renaming. # Future work From 183218c994066082f3789135d295e3d710727a34 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Wed, 13 Oct 2021 13:41:28 -0400 Subject: [PATCH 05/20] nixpkgs-ifd: Make enforcement clearer Thanks @L-as --- rfcs/0000-nixpkgs-ifd.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/rfcs/0000-nixpkgs-ifd.md b/rfcs/0000-nixpkgs-ifd.md index 62d2b6bda..f53cc5bee 100644 --- a/rfcs/0000-nixpkgs-ifd.md +++ b/rfcs/0000-nixpkgs-ifd.md @@ -47,9 +47,11 @@ The only barrier then is addressing eval resource usage costs. 2. Add a new `allImportedDerivations` top-level attribute. This *must* be buildable with `enableImportFromDerivation = false`. - It *must* have in its runtime closure any derivation output that Nixpkgs with `enableImportFromDerivation = true` imports. + It *must* have in its run-time closure any derivation output that Nixpkgs with `enableImportFromDerivation = true` imports. + \(CI will verify these conditions as described in the next subsection.\) -3. Any code vendored in Nixpkgs must correspond to code produced in a derivation, so the code can be mechanistically re-vendored. +3. Any code vendored in Nixpkgs *must* correspond to code produced in an imported derivation, so the code can be mechanistically re-vendored. + We should write tests that each pair of vendored and computed derivations are the same. ## Hydra policy @@ -59,8 +61,8 @@ Instead of kicking of single evaluations of Nixpkgs, we will kick of double eval 2. Build `allImportedDerivations`, and copy its closure to the evaluation machine. - 3. Evaluate Nixpkgs with `enableImportFromDerivation = true`, with the closure added to the eval paths whitelist, and with IFD partially "allowed, but with `-j0`". - What this means is no building can happen at eval time, but we can import derivations that are already built. + 3. Evaluate Nixpkgs with `enableImportFromDerivation = true`, with the closure of `allImportedDerivations` added to the eval paths whitelist, and with IFD partially "allowed, but with `-j0`". + What this means is no building can happen at eval time, but we can import the outputs of derivations that are already built and whitelisted. # Examples and Interactions [examples-and-interactions]: #examples-and-interactions From a3119780a12813c83a8e51ea0597b1410f3271ba Mon Sep 17 00:00:00 2001 From: John Ericson Date: Wed, 13 Oct 2021 13:44:54 -0400 Subject: [PATCH 06/20] nixpkgs-ifd: Fix typo --- rfcs/0000-nixpkgs-ifd.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/0000-nixpkgs-ifd.md b/rfcs/0000-nixpkgs-ifd.md index f53cc5bee..694edb09e 100644 --- a/rfcs/0000-nixpkgs-ifd.md +++ b/rfcs/0000-nixpkgs-ifd.md @@ -55,7 +55,7 @@ The only barrier then is addressing eval resource usage costs. ## Hydra policy -Instead of kicking of single evaluations of Nixpkgs, we will kick of double evaluations: +Instead of kicking off single evaluations of Nixpkgs, we will kick off double evaluations: 1. Evaluate Nixpkgs normally. From a8e2f3549f3dfa5c542dfbd3d52616b75efff043 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Wed, 13 Oct 2021 14:54:55 -0400 Subject: [PATCH 07/20] nixpkgs-ifd: Mention alternative of smaller first eval --- rfcs/0000-nixpkgs-ifd.md | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/rfcs/0000-nixpkgs-ifd.md b/rfcs/0000-nixpkgs-ifd.md index 694edb09e..964a01ed6 100644 --- a/rfcs/0000-nixpkgs-ifd.md +++ b/rfcs/0000-nixpkgs-ifd.md @@ -124,7 +124,21 @@ TODO, demonstrate changes to Nixpkgs, e.g. using the Haskell infrastructure, in # Alternatives [alternatives]: #alternatives -The only alternative that isn't massively harder is doing nothing. +1. Instead of evaluating Nixpkgs twice, just evaluate `allImportedDerivations` the first round. + + We could do this, and would reduce total eval time, yes. + But, I think it would come at the cost of inciting great controversy. + This means users of the `enableImportFromDerivation = false` subset of Nixpkgs would still have to *wait*, for all the IFD to complete first. + And remember, with mass rebuilds, that could be quite some time. + Increasing the critical path length of *everything* we do with Nixpkgs would cause real pain in some quarters, and I don't want that to pay that as the cost of IFD. + + With the plan as written, users of packages depending on IFD do have to wait slightly longer as the first eval is longer (and we wait for it before beginning to build `allImportedDerivations`). + But I think that is fair; we would be the "new constituency", the bottom of the pecking order, and so we should be patient so that other's workflows are not disturbed. + + Longer term we could revisit this, or we could e.g. double down on automatic vendoring, committing all generated code to a second "roll-up" repo. + Many options between those two extremes; I rather not worry to much about it now and just take the conservative polite route proposed here to begin. + +2. As always, do nothing, and keep the status quo. # Unresolved questions [unresolved]: #unresolved-questions From 35df663157372fba0313b1efd377784dbb73c10a Mon Sep 17 00:00:00 2001 From: John Ericson Date: Wed, 13 Oct 2021 15:21:20 -0400 Subject: [PATCH 08/20] nixpkgs-ifd: Add eval and sphinx example Thanks @FRidh --- rfcs/0000-nixpkgs-ifd.md | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/rfcs/0000-nixpkgs-ifd.md b/rfcs/0000-nixpkgs-ifd.md index 964a01ed6..f34c463e1 100644 --- a/rfcs/0000-nixpkgs-ifd.md +++ b/rfcs/0000-nixpkgs-ifd.md @@ -67,7 +67,28 @@ Instead of kicking off single evaluations of Nixpkgs, we will kick off double ev # Examples and Interactions [examples-and-interactions]: #examples-and-interactions -TODO, demonstrate changes to Nixpkgs, e.g. using the Haskell infrastructure, in a fork, and link? +1. TODO, demonstrate changes to Nixpkgs, e.g. using the Haskell infrastructure, in a fork, and link? + +2. Vendoring to avoid critical path regressions + + @FRidh brings up a good example, that of GHC and Sphinx. + Today, both are in Nixpkgs, and GHC depends on Sphinx to render it's docs. + With this change, we could perhaps instead package Sphinx via some hypothetical "pypi2nix" IFD. + That would mean GHC also indirectly depends on pypi2nix. + + To avoid the fallout, we could replace the hand-written Sphinx with a vendored copy of the generate code. + We would then test that the IFD and vendored Sphinx are the same. + Sphinx, if I recall correctly, might has some non-python dependencies. + Just as we do for Haskell packages today, handwritten overs overrides of the generated stuff would remain in Nixpkgs to make that go. + In this way, Sphinx and GHC don't "regress", remaining usable from the first `enableImportFromDerivation = false` evaluation. + + Now, one might argue that GHC is not very useful except for building downstream packages. + Also, with or without this PR, I have a very long-standing goal to build the compiler itself and "wired-in" libariess separately, which would allow using cabal2nix for much of GHC itself. + *If* we do that, and also *if* we decide to stop vendoring the generated Hackage packages and only rely on IFD, GHC would become a second-eval-only, `enableImportFromDerivation = true`-only package. + At that point, there might not be a reason to vendor Sphinx anymore, and so we would stop doing so and only rely on the IFD too. + + Again, note that the final paragraph of that story is purely hypothetical, just one possible future. + This RFC does *not* propose making any specific concrete packages second-eval-only. # Drawbacks [drawbacks]: #drawbacks From b93b2a9a35db8bf9f64c562d8fc86a7d52750f9f Mon Sep 17 00:00:00 2001 From: John Ericson Date: Fri, 26 Nov 2021 12:26:56 -0500 Subject: [PATCH 09/20] nixpkgs-ifd: Fix typo Thanks @SuperSandro2000 Co-authored-by: Sandro --- rfcs/0000-nixpkgs-ifd.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/0000-nixpkgs-ifd.md b/rfcs/0000-nixpkgs-ifd.md index f34c463e1..43413f150 100644 --- a/rfcs/0000-nixpkgs-ifd.md +++ b/rfcs/0000-nixpkgs-ifd.md @@ -121,7 +121,7 @@ Instead of kicking off single evaluations of Nixpkgs, we will kick off double ev 3. > IFD, is too controversial, don't do it! I think this is a classic example of don't let the perfect be the enemy of the good. - The problems with IFD and the problems IFD is trying trying to address both don't let a lot of attention. + The problems with IFD and the problems IFD is trying to address both don't let a lot of attention. The fact of the matter is Nixpkgs is how this community coordinates with itself, and agrees on priorities. If it isn't being used in Nixpkgs, there is hard ceiling of how much attention it will get. From da89a76288c159b6446f0f12834d37c3d9eb17cb Mon Sep 17 00:00:00 2001 From: John Ericson Date: Fri, 26 Nov 2021 13:24:46 -0500 Subject: [PATCH 10/20] Apply suggestions from code review Co-authored-by: sterni --- rfcs/0000-nixpkgs-ifd.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/rfcs/0000-nixpkgs-ifd.md b/rfcs/0000-nixpkgs-ifd.md index 43413f150..681802b26 100644 --- a/rfcs/0000-nixpkgs-ifd.md +++ b/rfcs/0000-nixpkgs-ifd.md @@ -83,7 +83,7 @@ Instead of kicking off single evaluations of Nixpkgs, we will kick off double ev In this way, Sphinx and GHC don't "regress", remaining usable from the first `enableImportFromDerivation = false` evaluation. Now, one might argue that GHC is not very useful except for building downstream packages. - Also, with or without this PR, I have a very long-standing goal to build the compiler itself and "wired-in" libariess separately, which would allow using cabal2nix for much of GHC itself. + Also, with or without this PR, I have a very long-standing goal to build the compiler itself and "wired-in" libraries separately, which would allow using cabal2nix for much of GHC itself. *If* we do that, and also *if* we decide to stop vendoring the generated Hackage packages and only rely on IFD, GHC would become a second-eval-only, `enableImportFromDerivation = true`-only package. At that point, there might not be a reason to vendor Sphinx anymore, and so we would stop doing so and only rely on the IFD too. @@ -102,7 +102,7 @@ Instead of kicking off single evaluations of Nixpkgs, we will kick off double ev You can think of that as a bunch of ungainly massive rectangular tiles we are trying to fit on a floor, the floor being our machine resources to schedule. What would be really bad is *increasing the tile size*. This means we need a bigger floor or else laying tile is harder. - What this is doing is *increasing the number of times*. + What this is doing is *increasing the number of tiles*. We can simple add a second floor to solve any problems that arise from that. This is simplification, sure, but I think the parable is correct to the "real" situation, too. @@ -179,4 +179,4 @@ I will say, though, with these steps, I think we will be able to successfully co In turn, I hope these upstream packages and ecosystems might even care about packaging and integration of the sort that we do. This would create a virtuous cycle where Nix is easier to use by more people, and Nixpkgs is easier to maintain as upstream packages better match our values. -The most important future work is technical, but being able to win upstream developer hearts and minds better than before, because ultimately distribution's live and die by upstream's decisions. +The most important future work is not technical, but being able to win upstream developer hearts and minds better than before, because ultimately distribution's live and die by upstream's decisions. From e38002ddd2cc25e2681b4394631212ceb4b3d51f Mon Sep 17 00:00:00 2001 From: John Ericson Date: Wed, 15 Dec 2021 13:22:53 -0500 Subject: [PATCH 11/20] nixpkgs-ifd: Add shepherd team MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Jörg Thalheim --- rfcs/0000-nixpkgs-ifd.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/rfcs/0000-nixpkgs-ifd.md b/rfcs/0000-nixpkgs-ifd.md index 681802b26..4b35b9e85 100644 --- a/rfcs/0000-nixpkgs-ifd.md +++ b/rfcs/0000-nixpkgs-ifd.md @@ -3,8 +3,8 @@ feature: nixpkgs-ifd start-date: 2021-10-12 author: John Ericson (@Ericson2314) co-authors: (find a buddy later to help out with the RFC) -shepherd-team: (names, to be nominated and accepted by RFC steering committee) -shepherd-leader: (name to be appointed by RFC steering committee) +shepherd-team: @L-as @grahamc @sternenseemann +shepherd-leader: @sternenseemann related-issues: (will contain links to implementation PRs) --- From d7a6f929d95ebdfb877378f6b7a3995b3ce50770 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Sun, 21 Aug 2022 11:34:47 -0400 Subject: [PATCH 12/20] nixpkgs-generated-code-policy: Rewrite and scale back goals --- rfcs/0000-nixpkgs-ifd.md | 229 +++++++++++++++++++++------------------ 1 file changed, 121 insertions(+), 108 deletions(-) diff --git a/rfcs/0000-nixpkgs-ifd.md b/rfcs/0000-nixpkgs-ifd.md index 4b35b9e85..31b22db88 100644 --- a/rfcs/0000-nixpkgs-ifd.md +++ b/rfcs/0000-nixpkgs-ifd.md @@ -1,5 +1,5 @@ --- -feature: nixpkgs-ifd +feature: nixpkgs-generated-code-policy start-date: 2021-10-12 author: John Ericson (@Ericson2314) co-authors: (find a buddy later to help out with the RFC) @@ -11,9 +11,9 @@ related-issues: (will contain links to implementation PRs) # Summary [summary]: #summary -Many of us want "import from derivation" \[hereafter, IFD\] be allowed in Nixpkgs. -IFD would be terrible for `hydra.nixos.org`, though, -The solution is to cut the Gordian knot: Allow IFD in Nixpkgs while still (effectively) prohibiting it in CI. +Nixpkgs contains non-trivial amounts of generated, rather than hand-written code. +We want to start systematizing to make it easier to maintain. +There is plenty of future work building upon this we could do, but we stop here for now to avoid needing to change any tools (Nix, Hydra, etc.). # Motivation [motivation]: #motivation @@ -25,158 +25,171 @@ To this new generation of developers, the distro (or homebrew) is a crufty relic Right now, to deal with these packages, we either convert by hand, or commit lots of generated code into Nixpkgs. But I don't think either of those options is healthy or sustainable. The problem with the first is sheer effort; we'll never be able to keep up. -The problem with the second is bloating Nixpkgs but more importantly reproducability: If someone wants to update that generated code it is unclear how. +The problem with the second is bloating Nixpkgs but more importantly reproducibility: If someone wants to update that generated code it is unclear how. All these mean that potential users coming from this new model of development find Nix / Nixpkgs cumbersome and unsuited to their needs. -The solution *outside* of Nixpkgs is today is "import from derivation", i.e. building code in Nix to be consumed at eval time. -Many institutional users of Nix use this. -But while the practice is banned in Nixpkgs, those efforts are not very coordinated, and the `lang2nix` ecosystem has a hard time getting off the ground. - -I am *not* arguing that IFD is the best possible solution. -But it's the one we've got to day, and long term alternatives, like RFC #92, face *significant* hurdles in being ergonomic and integrating with current idioms in Nixpkgs -- e.g. the `meta` on every derivation from `mkDerivation`. -In the spirit of learning to walk before learning to run, and beginning to acknowledge addresses these problems, we are best-serviced by getting IFD in Nixpkgs as a first-gen solution as soon as possible. -The only barrier then is addressing eval resource usage costs. +The lowest hanging fruit is to systematize our generated code. +We should ensure anyone can update the generated code, which means it should be built in derivations not some ad-hoc way. +In short, we should apply the same level of rigour that we do for packages themselves to generate code. # Detailed design [design]: #detailed-design ## Nixpkgs -1. Add a new `enableImportFromDerivation` config parameter to Nixpkgs. - When it is `false`, anything using IFD must be disabled so that a regular evaluation like we do today succeeds. +1. Establish the policy that all generated code in nixpkgs must be produced by a derivation. + The derivation should be built by CI (so exposed as some Nixpkgs in some fashion). -2. Add a new `allImportedDerivations` top-level attribute. - This *must* be buildable with `enableImportFromDerivation = false`. - It *must* have in its run-time closure any derivation output that Nixpkgs with `enableImportFromDerivation = true` imports. - \(CI will verify these conditions as described in the next subsection.\) +2. Implement script(s) for maintainers which automatically builds these derivations and vendors their results to the appropriate places. + Running such scripts should be sufficient to regenerated all generated code in Nixpkgs. -3. Any code vendored in Nixpkgs *must* correspond to code produced in an imported derivation, so the code can be mechanistically re-vendored. - We should write tests that each pair of vendored and computed derivations are the same. +3. Ensure via CI that the vendored generated code is exactly what running the scripts produce. + This check should be one of the "channel blocking" CI jobs. -## Hydra policy +# Examples and Interactions +[examples-and-interactions]: #examples-and-interactions -Instead of kicking off single evaluations of Nixpkgs, we will kick off double evaluations: +## Impurities - 1. Evaluate Nixpkgs normally. +Many `lang2nix`-type tools have impure steps today. +Since these tools must be only invoked inside the derivations to generate code, the impure inputs must be gotten via fixed output derivations. +This might require changes to those tools - 2. Build `allImportedDerivations`, and copy its closure to the evaluation machine. +Updating fixed output hashes and similar however is perfectly normal and not affected by this RFC. +The updates can be performed by hand, or with update bots like today. +The update bots would just need to learn to run the regeneration script (or risk failing CI because the vendored generated code is caught as being out of date). - 3. Evaluate Nixpkgs with `enableImportFromDerivation = true`, with the closure of `allImportedDerivations` added to the eval paths whitelist, and with IFD partially "allowed, but with `-j0`". - What this means is no building can happen at eval time, but we can import the outputs of derivations that are already built and whitelisted. +## Idempotency and bootstrapping -# Examples and Interactions -[examples-and-interactions]: #examples-and-interactions +The test that the generated sources are up to date will have to work by regenerating those generated sources and then taking a diff. +That means the regeneration process hash to be idempotent in that running it twice is the same as running it once. + +This is a bit tricker than it sounds, because many `lang2nix` tools rely on their own output. +E.g. the Nix packaging for `cabal2nix` is itself generated with `cabal2nix`. +Sane setups should work fine --- after all, it would be really weird if two valid builds of `cabal2nix` behaved so differently as to generate different code --- but it still an issue worth being aware of. -1. TODO, demonstrate changes to Nixpkgs, e.g. using the Haskell infrastructure, in a fork, and link? +(That we continue to vendor code does at least "unroll" the bootstrapping to avoid issues that we would have with, say, import-from-derivation alone. +The vendored code works analogously to the prebuilt bootstrapping tools in this case.) -2. Vendoring to avoid critical path regressions +@sternenseemann reminds me that some `lang2nix` tools might pin a Nixpkgs today, for various reasons. +But in this plan the tools must be built with the current Nixpkgs in the CI job ensuring sources are up to date. +`lang2nix` tools must therefore be kept continuously working when built against the latest Nixpkgs. - @FRidh brings up a good example, that of GHC and Sphinx. - Today, both are in Nixpkgs, and GHC depends on Sphinx to render it's docs. - With this change, we could perhaps instead package Sphinx via some hypothetical "pypi2nix" IFD. - That would mean GHC also indirectly depends on pypi2nix. +## What CI to use? - To avoid the fallout, we could replace the hand-written Sphinx with a vendored copy of the generate code. - We would then test that the IFD and vendored Sphinx are the same. - Sphinx, if I recall correctly, might has some non-python dependencies. - Just as we do for Haskell packages today, handwritten overs overrides of the generated stuff would remain in Nixpkgs to make that go. - In this way, Sphinx and GHC don't "regress", remaining usable from the first `enableImportFromDerivation = false` evaluation. +The easiest, and most important foundational step to do is just add a regular `release.nix` job for Hydra to test. +We might, however, want to catch these issues earlier at PR merge time, with ofborg or GitHub actions. +That is fine too. - Now, one might argue that GHC is not very useful except for building downstream packages. - Also, with or without this PR, I have a very long-standing goal to build the compiler itself and "wired-in" libraries separately, which would allow using cabal2nix for much of GHC itself. - *If* we do that, and also *if* we decide to stop vendoring the generated Hackage packages and only rely on IFD, GHC would become a second-eval-only, `enableImportFromDerivation = true`-only package. - At that point, there might not be a reason to vendor Sphinx anymore, and so we would stop doing so and only rely on the IFD too. +## Who does the work? - Again, note that the final paragraph of that story is purely hypothetical, just one possible future. - This RFC does *not* propose making any specific concrete packages second-eval-only. +In the short term, this is a decent chunk of work for `lang2nix` tool authors and language-specific packages maintainers, who must work to ensure their tools and workflows are brought into line with this policy. +That won't always be fun! + +On the flip side, a major cost of today's situation is since so many of the workflows are more an "oral tradition" to the maintainers and not fully reproducible, one-off contributors often need a lot of hand-holding. +@sternenseemann tells me he must spend a lot of manual time shepherding PRs, because those PR authors are unable to jump through the hoops themselves. # Drawbacks [drawbacks]: #drawbacks -1. > We've *doubled* the amount of evaluation we do, oh no! +This is now a very conservative RFC so I do not think there are any drawbacks as to the goals themselves. - Sounds scary, I know. - But I don't think that's bad, actually. - What's so bad today is the time and memory usage of *each* evaluation. +Bringing our tools into compliance with this policy will take effort, and of course that effort could be spent elsewhere, so there is opportunity cost to be aware of. +But given the general level of concern over the sustainability of Nixpkgs, I think the benefits are worth the costs. - You can think of that as a bunch of ungainly massive rectangular tiles we are trying to fit on a floor, the floor being our machine resources to schedule. - What would be really bad is *increasing the tile size*. - This means we need a bigger floor or else laying tile is harder. - What this is doing is *increasing the number of tiles*. - We can simple add a second floor to solve any problems that arise from that. +# Alternatives +[alternatives]: #alternatives - This is simplification, sure, but I think the parable is correct to the "real" situation, too. +None at this time, we had other ideas but they are reframed as future work. +The one proposed here is unquestionably the most conservative one, and basically a prerequisite of all the others. -2. > Isn't IFD really slow? +# Unresolved questions +[unresolved]: #unresolved-questions - What is slow is that evaluator has no parallelism. - That means is that every time we hit an *unbuilt* derivation, we block until it's finished building. - Worse, even if it is easy to run, we're probably going to check some substituters, etc., so there are all sorts of slow IO round trips making the critical path worse. - We could fix this, but there is no energy to do so right now. - Making the evaluator parallel without making our memory issues worse is hard work. +None at this time. - But, none of that matters for this proposal. - `hydra.nixos.org` will only need to read built paths, and that shouldn't be meaningfully slower than regular `import`-ing. +# Future work +[future]: #future-work -3. > IFD, is too controversial, don't do it! +## Vendor generated code "out of tree" - I think this is a classic example of don't let the perfect be the enemy of the good. - The problems with IFD and the problems IFD is trying to address both don't let a lot of attention. - The fact of the matter is Nixpkgs is how this community coordinates with itself, and agrees on priorities. - If it isn't being used in Nixpkgs, there is hard ceiling of how much attention it will get. +The first issue that remains after this RFC is generated code still bloats the Nixpkgs history. +It would be nice to get it "out of tree" (outside the Nixpkgs repo) so this is no longer the case. +In our shepherd discussions we had two ideas for how this might proceed. - The benefits of IFD don't get enough attention. - A package in Nixpkgs is more than a derivation: there's the `meta` as I mentioned above. - There's also being able to read the code and (somewhat) understand what's going in reference to other derivations. - Finally, there's being able to `override`, `overrideAttrs`, etc. the derivation downstream. - IFD alone allows computed packages that follow all these norms. +It was tempting to go straight to proposing one of these as part of the RFC proper, +but they both contained enough hard-to-surmount issues that we figured it was better to start something more conservative first. -4. Say I want IFD that depends on other IFD? +### Dump in other repo and fetch it - In other words, can one import a derivation that is itself evaluated from and import from a derivation? - No, not without introducing another round of building and evaluating for Hydra. - But I don't think we need arbitrarily-deep dynamism anyways: it is a tool that should be used with care anyways, because stasis \[staticism?\] is the goodly disciplinarian that makes Nixpkgs so great. +We could opt to offload all generated code into a separate repository which would become an optional additional input to nixpkgs. +This could be done via an extra `fetchTarball`, possibly a (somehow synced) channel or, in the presence of experimental features, a flake input. - That said, `cabal2nix` is written in Haskell, `crate2nix` is written in Rust, etc. etc. - We can vendor enough code to build these tools and thus bootstrap the IFD we will do. - Per the 3rd rule for Nixpkgs above, as long as we make the vendoring automatic and pure, this is fine, and improvement upon today. - Also, even if we didn't have the "one round of dynamism" restriction, we would still have the bootstrapping issue. +#### Drawbacks -# Alternatives -[alternatives]: #alternatives +- This would be a truly breaking change to nixpkgs user interface: + Either an additional input would need to be provided or fetched (which wouldn't interact well with restrict-eval). -1. Instead of evaluating Nixpkgs twice, just evaluate `allImportedDerivations` the first round. +- Generated code becomes a second class as the extra input would need to be optional for this reason. + This is problematic for central packages that use code generation already today (pandoc, cachix, …). - We could do this, and would reduce total eval time, yes. - But, I think it would come at the cost of inciting great controversy. - This means users of the `enableImportFromDerivation = false` subset of Nixpkgs would still have to *wait*, for all the IFD to complete first. - And remember, with mass rebuilds, that could be quite some time. - Increasing the critical path length of *everything* we do with Nixpkgs would cause real pain in some quarters, and I don't want that to pay that as the cost of IFD. +- Similar Bootstrapping problems as the other alternative below: new generated code needs nixpkgs and a previous version of the generated code. - With the plan as written, users of packages depending on IFD do have to wait slightly longer as the first eval is longer (and we wait for it before beginning to build `allImportedDerivations`). - But I think that is fair; we would be the "new constituency", the bottom of the pecking order, and so we should be patient so that other's workflows are not disturbed. +- `builtins.fetch*` is a nuisance to deal with at the moment and would probably need to be improved to make this work. + E.g. gcrooting this evaluation only dependency could prove tricky without changes to Nix. - Longer term we could revisit this, or we could e.g. double down on automatic vendoring, committing all generated code to a second "roll-up" repo. - Many options between those two extremes; I rather not worry to much about it now and just take the conservative polite route proposed here to begin. +- Extra bureaucracy would be involved with updating the generated repository and the reference to it in nixpkgs. + Additionally, special support in CI would be required for this. -2. As always, do nothing, and keep the status quo. +### Nixpkgs itself becomes a derivation output -# Unresolved questions -[unresolved]: #unresolved-questions +This alternative implementation was proposed by @L-as at the meeting. +The idea is that nixpkgs would become a derivation that builds a “regular” nixpkgs source tree by augmenting files available statically with code generation. -Should we call it "import from Derivation", or should we give it a different name? -`builtins.readFile ` is really the same thing for our purposes, so I am sympathetic to renaming. +The upside of this would be that there would only be one instance of IFD that can ever happen, namely when the source tree is built. +The produced store path then would require no IFD, and it would be obvious what relates to IFD and what doesn't. -# Future work -[future]: #future-work +In practice, IFD would not be necessary for users of nixpkgs if we can design a mechanism that allows the dynamically produced nixpkgs source tree to be used as a channel. +Then the IFD would only need to be executed when working on nixpkgs. + +#### Drawbacks + +- This approach creates a bootstrapping problem for the entirety of nixpkgs, not just for the IFD parts. + It would be necessary to build the new nixpkgs source tree using an old version of the nixpkgs source tree. + This could either be done using a fixed “nixpkgs bootstrap tarball” which occasionally needs to be bumped manually as code generation tools require newer dependencies, or by pulling in the latest nixpkgs source tree produced by e.g. Hydra. + The latter approach of course runs the risk of getting stuck at a bad nixpkgs revision which is unable to build the next ones fixing the problem. + +- Working on nixpkgs may involve more friction: It'd require a bootstrap nixpkgs to be available and executing the IFD for the nixpkgs source tree, likely involving hundreds of derivations. + +- Hydra jobsets would need to be sequenced: First the new nixpkgs source tree would need to be built before it can be passed on to the regular `nixpkgs:trunk`, `nixos:trunk-combined` etc. jobsets. + +- Channel release would change significantly: Instead of having a nixpkgs git revision from which a channel tarball is produced (mostly by adding version information to the tree), a checkout of nixpkgs would produce a store path from which the channel tarball would be produced. + This could especially pose a problem for the experimental Flakes feature which currently (to my knowledge) assumes that inputs are git repositories. -In a grand future we might do things completely differently. -But I have no idea how stuff is going to shake out. -In particular, if we don't do something like this, I don't think we will ever get to that future. -So even if this technically "barking up the wrong tree", I think it is a necessary first step to get things going. +## Import from derivation -I will say, though, with these steps, I think we will be able to successfully convert to Nix a bunch of developers that mainly work in one language, and didn't even think they were in need of a better distro. +Even if we store the generated sources outside of tree, we are still doing the tedious work of semi-manually remaining a build cache (this time of Nix code). +Isn't that what Nix itself is for! + +"import from derivation" is a technique where Nix code can simply import the result of a build, with no vendoring generated code in-tree or out-of-tree needed. + +There are a number of implementation issues with it, however, that means we can't simply enable it on `hydra.nixos.org` today. +We have some "low tech" mitigations that were the original body of this RFC, +but they still require changing tools (Hydra), which adds latency and risk to the project. + +## Reaching developers, more broadly + +This proposal is far from the final decision on how language-specific ecosystems packages should be dealt with. +I make no predictions for the far future, it is possible we will eventually land on something completely different. + +However, I think this RFC will help us reach a very big milestone where the `lang2nix` ecosystem and Nixpkgs will both be talking to each other a bit better, not just Nixpkgs saying things but not listening to a chaotic and disorganized `lang2nix` ecosystem. +This culture shift I think will be the main and most important legacy of this RFC. + +A lot of developers come to the Nix ecosystem, and find that the tools work great for sysadmin-y or power-user-y things (NixOS, home-manager, etc.) but the development experience is not nearly as clearly better than using language-specific tools in comparison. +(I prefer it, but the tradeoffs are very complex.) +With the new both-ways communication described above, I think we'll have a huge leg up in refining best practices so that ultimately we have better developement workflows, and retain these people better. + +The developers I am most eager to reach are those of major upstream projects In turn, I hope these upstream packages and ecosystems might even care about packaging and integration of the sort that we do. This would create a virtuous cycle where Nix is easier to use by more people, and Nixpkgs is easier to maintain as upstream packages better match our values. - -The most important future work is not technical, but being able to win upstream developer hearts and minds better than before, because ultimately distribution's live and die by upstream's decisions. +Instead of a situation where distros and upstream projects don't really like each other, we might end up with a situation where they all get along via Nix. From d38062a51c9a01cbab795908f68d05bee39c7d21 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Sun, 21 Aug 2022 11:38:07 -0400 Subject: [PATCH 13/20] nixpkgs-generated-code-policy: Rename proposal --- ...{0000-nixpkgs-ifd.md => 0109-nixpkgs-generated-code-policy.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename rfcs/{0000-nixpkgs-ifd.md => 0109-nixpkgs-generated-code-policy.md} (100%) diff --git a/rfcs/0000-nixpkgs-ifd.md b/rfcs/0109-nixpkgs-generated-code-policy.md similarity index 100% rename from rfcs/0000-nixpkgs-ifd.md rename to rfcs/0109-nixpkgs-generated-code-policy.md From e7249814af6998b5c50959e528bb74985ad9a32d Mon Sep 17 00:00:00 2001 From: John Ericson Date: Sun, 21 Aug 2022 11:41:13 -0400 Subject: [PATCH 14/20] nixpkgs-generated-code-policy: Remove some indirect motivation --- rfcs/0109-nixpkgs-generated-code-policy.md | 6 ------ 1 file changed, 6 deletions(-) diff --git a/rfcs/0109-nixpkgs-generated-code-policy.md b/rfcs/0109-nixpkgs-generated-code-policy.md index 31b22db88..3797ddf16 100644 --- a/rfcs/0109-nixpkgs-generated-code-policy.md +++ b/rfcs/0109-nixpkgs-generated-code-policy.md @@ -20,7 +20,6 @@ There is plenty of future work building upon this we could do, but we stop here Nixpkgs, along with every other distro, also faces a looming crisis: new open source software is increasingly not intended to be packaged by distros at all. Many languages now support very large library ecosystems, with dependencies expressed in a language-specific package manager. -To this new generation of developers, the distro (or homebrew) is a crufty relic from an earlier age to bootstrap modernity, and then be forgotten about. Right now, to deal with these packages, we either convert by hand, or commit lots of generated code into Nixpkgs. But I don't think either of those options is healthy or sustainable. @@ -188,8 +187,3 @@ This culture shift I think will be the main and most important legacy of this RF A lot of developers come to the Nix ecosystem, and find that the tools work great for sysadmin-y or power-user-y things (NixOS, home-manager, etc.) but the development experience is not nearly as clearly better than using language-specific tools in comparison. (I prefer it, but the tradeoffs are very complex.) With the new both-ways communication described above, I think we'll have a huge leg up in refining best practices so that ultimately we have better developement workflows, and retain these people better. - -The developers I am most eager to reach are those of major upstream projects -In turn, I hope these upstream packages and ecosystems might even care about packaging and integration of the sort that we do. -This would create a virtuous cycle where Nix is easier to use by more people, and Nixpkgs is easier to maintain as upstream packages better match our values. -Instead of a situation where distros and upstream projects don't really like each other, we might end up with a situation where they all get along via Nix. From d3b8f6f23178b879b719503abd1a563fea1097af Mon Sep 17 00:00:00 2001 From: John Ericson Date: Tue, 23 Aug 2022 11:12:04 -0400 Subject: [PATCH 15/20] nixpkgs-generated-code-policy: Emphasize that the future work is highly tentative --- rfcs/0109-nixpkgs-generated-code-policy.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/rfcs/0109-nixpkgs-generated-code-policy.md b/rfcs/0109-nixpkgs-generated-code-policy.md index 3797ddf16..74fe53bfe 100644 --- a/rfcs/0109-nixpkgs-generated-code-policy.md +++ b/rfcs/0109-nixpkgs-generated-code-policy.md @@ -99,7 +99,7 @@ But given the general level of concern over the sustainability of Nixpkgs, I thi # Alternatives [alternatives]: #alternatives -None at this time, we had other ideas but they are reframed as future work. +None at this time, we had other ideas but they are reframed as possible future work. The one proposed here is unquestionably the most conservative one, and basically a prerequisite of all the others. # Unresolved questions @@ -110,7 +110,7 @@ None at this time. # Future work [future]: #future-work -## Vendor generated code "out of tree" +## A possible 2nd step: Vendor generated code "out of tree" The first issue that remains after this RFC is generated code still bloats the Nixpkgs history. It would be nice to get it "out of tree" (outside the Nixpkgs repo) so this is no longer the case. @@ -119,7 +119,7 @@ In our shepherd discussions we had two ideas for how this might proceed. It was tempting to go straight to proposing one of these as part of the RFC proper, but they both contained enough hard-to-surmount issues that we figured it was better to start something more conservative first. -### Dump in other repo and fetch it +### Alternative 1: Dump in other repo and fetch it We could opt to offload all generated code into a separate repository which would become an optional additional input to nixpkgs. This could be done via an extra `fetchTarball`, possibly a (somehow synced) channel or, in the presence of experimental features, a flake input. @@ -140,7 +140,7 @@ This could be done via an extra `fetchTarball`, possibly a (somehow synced) chan - Extra bureaucracy would be involved with updating the generated repository and the reference to it in nixpkgs. Additionally, special support in CI would be required for this. -### Nixpkgs itself becomes a derivation output +### Alternative 2: Nixpkgs itself becomes a derivation output This alternative implementation was proposed by @L-as at the meeting. The idea is that nixpkgs would become a derivation that builds a “regular” nixpkgs source tree by augmenting files available statically with code generation. @@ -165,7 +165,7 @@ Then the IFD would only need to be executed when working on nixpkgs. - Channel release would change significantly: Instead of having a nixpkgs git revision from which a channel tarball is produced (mostly by adding version information to the tree), a checkout of nixpkgs would produce a store path from which the channel tarball would be produced. This could especially pose a problem for the experimental Flakes feature which currently (to my knowledge) assumes that inputs are git repositories. -## Import from derivation +## A possible 3rd step: Import from derivation Even if we store the generated sources outside of tree, we are still doing the tedious work of semi-manually remaining a build cache (this time of Nix code). Isn't that what Nix itself is for! From f4d1e9b288a9401b8de172ca28dd147209a4fe02 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Tue, 23 Aug 2022 11:15:18 -0400 Subject: [PATCH 16/20] nixpkgs-generated-code-policy: Relate future work in alt section more explicitly --- rfcs/0109-nixpkgs-generated-code-policy.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/rfcs/0109-nixpkgs-generated-code-policy.md b/rfcs/0109-nixpkgs-generated-code-policy.md index 74fe53bfe..cd82ba79e 100644 --- a/rfcs/0109-nixpkgs-generated-code-policy.md +++ b/rfcs/0109-nixpkgs-generated-code-policy.md @@ -99,8 +99,10 @@ But given the general level of concern over the sustainability of Nixpkgs, I thi # Alternatives [alternatives]: #alternatives -None at this time, we had other ideas but they are reframed as possible future work. -The one proposed here is unquestionably the most conservative one, and basically a prerequisite of all the others. +No good this time, we had other ideas but they are reframed as *possible* future work. +It is unclear which of the alternative "2nd steps" is better, or whether we ought to try to jump ahead straight to the "3rd step". + +The plan proposed here is unquestionably the most conservative one, and basically a prerequisite of all the others --- a first step no matter what we plan to do afterwords. # Unresolved questions [unresolved]: #unresolved-questions From 1ab91b6a703ad23fac7850235a62d7a0e7af3403 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Tue, 23 Aug 2022 11:34:08 -0400 Subject: [PATCH 17/20] nixpkgs-generated-code-policy: More on impurities Also talk about migration plan in the form of rigid enforcement for new tools but a grace period for existing tools. --- rfcs/0109-nixpkgs-generated-code-policy.md | 27 ++++++++++++++++++---- 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/rfcs/0109-nixpkgs-generated-code-policy.md b/rfcs/0109-nixpkgs-generated-code-policy.md index cd82ba79e..d0a9c8acc 100644 --- a/rfcs/0109-nixpkgs-generated-code-policy.md +++ b/rfcs/0109-nixpkgs-generated-code-policy.md @@ -42,6 +42,9 @@ In short, we should apply the same level of rigour that we do for packages thems 2. Implement script(s) for maintainers which automatically builds these derivations and vendors their results to the appropriate places. Running such scripts should be sufficient to regenerated all generated code in Nixpkgs. + Greenfield tooling should not be merged unless it complies with the policy from day one. + Existing non-compliant tooling doesn't need to be ripped out of Nixpkgs, but the "grace period" in which is brought to compliance should be bounded. + 3. Ensure via CI that the vendored generated code is exactly what running the scripts produce. This check should be one of the "channel blocking" CI jobs. @@ -52,10 +55,17 @@ In short, we should apply the same level of rigour that we do for packages thems Many `lang2nix`-type tools have impure steps today. Since these tools must be only invoked inside the derivations to generate code, the impure inputs must be gotten via fixed output derivations. -This might require changes to those tools +This might require changes to those tools to separate the pure work from the impuire gather steps. + +Additionally, as @7c6f434c point out, some upstream tooling thinks it is being pure, but the "lock files" (or similar) pinning mechanism it provides isn't up to the task for Nix's purposes. +Quicklisp, for example, uses a "weird mix of MD5 constraints and SHA1 constraints" that isn't really up to the task. +Another example would be using git commit hashes, which, since we don't want to download the whole history, are not good enough on their own. + +A concrete example of a change that would bring such tooling into compliance is via "prefetching" to build a map of insufficient upstream-tool keys (say a pair of a name and lousy hash) to higher quality hashes for fixed output derivations. +The prefetching step would be run impurely but do as little work as possible, and the remaining bulk of the work would be done purely in derivations. -Updating fixed output hashes and similar however is perfectly normal and not affected by this RFC. -The updates can be performed by hand, or with update bots like today. +Updating fixed output hashes and similar --- including running such a prefetch script as described above --- however, is perfectly normal and not affected by this RFC. +Such updates, as opposed to regenerations of Nix code, can be performed by hand, or with update bots like today. The update bots would just need to learn to run the regeneration script (or risk failing CI because the vendored generated code is caught as being out of date). ## Idempotency and bootstrapping @@ -107,7 +117,7 @@ The plan proposed here is unquestionably the most conservative one, and basicall # Unresolved questions [unresolved]: #unresolved-questions -None at this time. +How long should the "grace period" be for bringing existing tooling into compliance be? # Future work [future]: #future-work @@ -178,6 +188,15 @@ There are a number of implementation issues with it, however, that means we can' We have some "low tech" mitigations that were the original body of this RFC, but they still require changing tools (Hydra), which adds latency and risk to the project. +## Getting upstream tools to agree on how to pin source code + +A source of frustration outlined in the [Impurities](#impurities) section is when upstream tools think they are pinning exactly dependencies down, but nonetheless do so in a way that isn't good enough for our purposes. +A long standing goal of mine is to try to communicate these concerns back upstream, and nudge everyone agreeing on a common definition of what a pinned deps looks like. + +I think policies such as this RFC proposes will allow us to get our `lang2nix` infrastructure in a more state not only more legible to ourselves (Nix users and contributors) but also upstream developers who won't want to spend too long investigating what exactly our requirements are. +That will make such concerns easier to communicate, and I think unlock the gradual convergence on a standard. +That's the hope at least! + ## Reaching developers, more broadly This proposal is far from the final decision on how language-specific ecosystems packages should be dealt with. From 24078f4533b298d4acf200bda9a3c8d1712d0a37 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Wed, 24 Aug 2022 15:12:43 -0400 Subject: [PATCH 18/20] Fix typoes Thanks! Co-authored-by: ash --- rfcs/0109-nixpkgs-generated-code-policy.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/rfcs/0109-nixpkgs-generated-code-policy.md b/rfcs/0109-nixpkgs-generated-code-policy.md index d0a9c8acc..ddb91eb3b 100644 --- a/rfcs/0109-nixpkgs-generated-code-policy.md +++ b/rfcs/0109-nixpkgs-generated-code-policy.md @@ -29,7 +29,7 @@ All these mean that potential users coming from this new model of development fi The lowest hanging fruit is to systematize our generated code. We should ensure anyone can update the generated code, which means it should be built in derivations not some ad-hoc way. -In short, we should apply the same level of rigour that we do for packages themselves to generate code. +In short, we should apply the same level of rigour that we do for packages themselves to generated code. # Detailed design [design]: #detailed-design @@ -193,7 +193,7 @@ but they still require changing tools (Hydra), which adds latency and risk to th A source of frustration outlined in the [Impurities](#impurities) section is when upstream tools think they are pinning exactly dependencies down, but nonetheless do so in a way that isn't good enough for our purposes. A long standing goal of mine is to try to communicate these concerns back upstream, and nudge everyone agreeing on a common definition of what a pinned deps looks like. -I think policies such as this RFC proposes will allow us to get our `lang2nix` infrastructure in a more state not only more legible to ourselves (Nix users and contributors) but also upstream developers who won't want to spend too long investigating what exactly our requirements are. +I think policies such as this RFC proposes will allow us to get our `lang2nix` infrastructure in a state not only more legible to ourselves (Nix users and contributors) but also to upstream developers who won't want to spend too long investigating what exactly our requirements are. That will make such concerns easier to communicate, and I think unlock the gradual convergence on a standard. That's the hope at least! From 47cb1a9d652c670bfa4743ea5936940c884daf8a Mon Sep 17 00:00:00 2001 From: Eelco Dolstra Date: Wed, 7 Sep 2022 15:23:46 +0200 Subject: [PATCH 19/20] Update shepherd team --- rfcs/0109-nixpkgs-generated-code-policy.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/0109-nixpkgs-generated-code-policy.md b/rfcs/0109-nixpkgs-generated-code-policy.md index ddb91eb3b..7b0f172d1 100644 --- a/rfcs/0109-nixpkgs-generated-code-policy.md +++ b/rfcs/0109-nixpkgs-generated-code-policy.md @@ -3,7 +3,7 @@ feature: nixpkgs-generated-code-policy start-date: 2021-10-12 author: John Ericson (@Ericson2314) co-authors: (find a buddy later to help out with the RFC) -shepherd-team: @L-as @grahamc @sternenseemann +shepherd-team: @L-as @sternenseemann @tomberek @DavHau shepherd-leader: @sternenseemann related-issues: (will contain links to implementation PRs) --- From 3876e77c52da39dead0580fd6bab7f8b11ba9ec8 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Thu, 15 Sep 2022 21:35:49 -0500 Subject: [PATCH 20/20] Apply suggestions from code review Thanks! Co-authored-by: Adam Joseph <54836058+amjoseph-nixpkgs@users.noreply.github.com> --- rfcs/0109-nixpkgs-generated-code-policy.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/rfcs/0109-nixpkgs-generated-code-policy.md b/rfcs/0109-nixpkgs-generated-code-policy.md index 7b0f172d1..3db8fbd50 100644 --- a/rfcs/0109-nixpkgs-generated-code-policy.md +++ b/rfcs/0109-nixpkgs-generated-code-policy.md @@ -61,8 +61,8 @@ Additionally, as @7c6f434c point out, some upstream tooling thinks it is being p Quicklisp, for example, uses a "weird mix of MD5 constraints and SHA1 constraints" that isn't really up to the task. Another example would be using git commit hashes, which, since we don't want to download the whole history, are not good enough on their own. -A concrete example of a change that would bring such tooling into compliance is via "prefetching" to build a map of insufficient upstream-tool keys (say a pair of a name and lousy hash) to higher quality hashes for fixed output derivations. -The prefetching step would be run impurely but do as little work as possible, and the remaining bulk of the work would be done purely in derivations. +A concrete example of a change that would bring such tooling into compliance is via "map-building" to build a map of insufficient upstream-tool keys (say a pair of a name and lousy hash) to higher quality hashes for fixed output derivations. +The map-building step would be run impurely but do as little work as possible, and the remaining bulk of the work would be done purely in derivations. Updating fixed output hashes and similar --- including running such a prefetch script as described above --- however, is perfectly normal and not affected by this RFC. Such updates, as opposed to regenerations of Nix code, can be performed by hand, or with update bots like today.