Skip to content
This repository has been archived by the owner on Dec 11, 2023. It is now read-only.

Runcommand/dynamic #2

Closed
wants to merge 20 commits into from
Closed

Runcommand/dynamic #2

wants to merge 20 commits into from

Conversation

grahamc
Copy link
Member

@grahamc grahamc commented Dec 15, 2021

Dynamically define jobs within a jobset by:

  1. Enabling them in the hydra.conf:
<dynamicruncommand>
  enable = 1
</dynamicruncommand>
  1. Set enable_dynamic_run_command to true on the project record and the jobset record in the database
  2. Create a job in the jobset:
{ pkgs, ... }: {
    runCommandHook = {
        recurseForDerivations = true;
        example = pkgs.writeScript "run-me" ''
          #!${pkgs.runtimeShell}
          ${pkgs.jq}/bin/jq . "$HYDRA_JSON"
        '';
    };
}

lead-up PRs to this:

Includes:

TODO

  • Extend the terraform provider to include dynamic runcommand support flags
  • Declarative jobsets should be rejected with a useful error if the server or project disable dynamic runcommand and the jobset specifies it as true
    • Semi-complete: we now log that the jobset was rejected, but there's no way (currently) to reject a declarative jobset PUT via the API.
  • Extend the API documentation to include the project and jobset dynamic runcommand support flags
  • Require the project enable dynamic run command support as well to avoid a declarative jobset from enabling it when it shouldn't be enabled.
  • Update runcommand.md to include the requirement that the jobset and project require the feature to be enabled
  • The API should reject updates enabling dynamic run command on a jobset if the project does and/or server disable it
  • The API should reject updates enabling dynamic run command on a project if server disables it
  • Don't let users in the UI enable dynamic runcommands if the server or project or jobset disable it.
  • Tell users why they can't enable dynamic runcommands in the UI.
  • If the server's config has dynamic runcommands disabled, on the project configuration page say it is not enabled because the server doesn't enable it.
  • If the server's config has dynamic runcommands disabled, on the jobset configuration page say it is not enabled because the server doesn't enable it.
  • If the project's config has dynamic runcommands disabled, on the jobset configuration page say it is not enabled because the project doesn't enable it.

@grahamc grahamc force-pushed the runcommand/dynamic branch 3 times, most recently from 6b0a2af to 7462a6f Compare December 15, 2021 17:06
@cole-h cole-h force-pushed the runcommand/dynamic branch 8 times, most recently from 814c2d0 to cf5d1fd Compare December 20, 2021 22:14
@grahamc grahamc force-pushed the runcommand/dynamic branch from cf5d1fd to 478b14c Compare January 3, 2022 16:07
This in-progress feature will run a dynamically generated set of
buildFinished hooks, which must be nested under the `runCommandHook.*`
attribute set. This implementation is not very good, with some to-dos:

1. Only run if the build succeeded
2. Verify the output is named $out and that it is an executable file
   (or a symlink to a file)
3. Require the jobset itself have a flag enabling the feature, since
   this feature can be a bit dangerous if various people of different
   trust levels can create the jobs.
eligible for execution under dynamic run commands.
@grahamc grahamc force-pushed the runcommand/dynamic branch from 478b14c to e124c3b Compare January 10, 2022 14:23
@@ -94,7 +94,7 @@ sub common {
if (defined $eval->flake) {
my $fl = $eval->flake;
print STDERR "Flake is $fl\n";
if ($eval->flake =~ m!github:([^/]+)/([^/]+)/([[:xdigit:]]{40})$! or $eval->flake =~ m!git\+ssh://git\@github.com/([^/]+)/([^/]+)\?.*rev=([[:xdigit:]]{40})$!) {
if ($eval->flake =~ m!github:([^/]+)/([^/]+)/([[:xdigit:]]{40})$! or $eval->flake =~ m!git\+ssh://git\@github[^/]+/([^/]+)/([^/]+)\?.*rev=([[:xdigit:]]{40})$!) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should break this matcher out in to a separate function, write some tests for it, and upstream it separately. The bit about doCheck should be dropped.

Comment on lines 264 to 270
my $enable_dynamic_run_command = defined $c->stash->{params}->{enable_dynamic_run_command} ? 1 : 0;
if ($enable_dynamic_run_command
&& !($c->config->{dynamicruncommand}->{enable}
&& $jobset->project->enable_dynamic_run_command))
{
badRequest($c, "Dynamic RunCommand is not enabled by the server or the parent project.");
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the right thing here is, but this pattern of checking has been peppered around. Instead we should probably have a single function which implements "is dynamic run command enabled?" and use that everywhere.

@cole-h cole-h force-pushed the runcommand/dynamic branch from e124c3b to ef6df46 Compare January 10, 2022 20:26
@cole-h cole-h mentioned this pull request Jan 10, 2022
@cole-h
Copy link
Member

cole-h commented Jan 10, 2022

Moved my commits to a separate PR (and also copied over your comments): #4.

@grahamc grahamc closed this Jan 11, 2022
cole-h pushed a commit that referenced this pull request Apr 7, 2022
On hydra.nixos.org the queue runner had child processes that were
stuck handling an exception:

  Thread 1 (Thread 0x7f501f7fe640 (LWP 1413473) "bld~v54h5zkhmb3"):
  #0  futex_wait (private=0, expected=2, futex_word=0x7f50c27969b0 <_rtld_local+2480>) at ../sysdeps/nptl/futex-internal.h:146
  #1  __lll_lock_wait (futex=0x7f50c27969b0 <_rtld_local+2480>, private=0) at lowlevellock.c:52
  #2  0x00007f50c21eaee4 in __GI___pthread_mutex_lock (mutex=0x7f50c27969b0 <_rtld_local+2480>) at ../nptl/pthread_mutex_lock.c:115
  #3  0x00007f50c1854bef in __GI___dl_iterate_phdr (callback=0x7f50c190c020 <_Unwind_IteratePhdrCallback>, data=0x7f501f7fb040) at dl-iteratephdr.c:40
  #4  0x00007f50c190d2d1 in _Unwind_Find_FDE () from /nix/store/65hafbsx91127farbmyyv4r5ifgjdg43-glibc-2.33-117/lib/libgcc_s.so.1
  #5  0x00007f50c19099b3 in uw_frame_state_for () from /nix/store/65hafbsx91127farbmyyv4r5ifgjdg43-glibc-2.33-117/lib/libgcc_s.so.1
  #6  0x00007f50c190ab90 in uw_init_context_1 () from /nix/store/65hafbsx91127farbmyyv4r5ifgjdg43-glibc-2.33-117/lib/libgcc_s.so.1
  #7  0x00007f50c190b08e in _Unwind_RaiseException () from /nix/store/65hafbsx91127farbmyyv4r5ifgjdg43-glibc-2.33-117/lib/libgcc_s.so.1
  #8  0x00007f50c1b02ab7 in __cxa_throw () from /nix/store/dd8swlwhpdhn6bv219562vyxhi8278hs-gcc-10.3.0-lib/lib/libstdc++.so.6
  #9  0x00007f50c1d01abe in nix::parseURL (url="[email protected]") at src/libutil/url.cc:53
  NixOS#10 0x0000000000484f55 in extraStoreArgs (machine="[email protected]") at build-remote.cc:35
  NixOS#11 operator() (__closure=0x7f4fe9fe0420) at build-remote.cc:79
  ...

Maybe the fork happened while another thread was holding some global
stack unwinding lock
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71744). Anyway, since
the hanging child inherits all file descriptors to SSH clients,
shutting down remote builds (via 'child.to = -1' in
State::buildRemote()) doesn't work and 'child.pid.wait()' hangs
forever.

So let's not do any significant work between fork and exec.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants