Skip to content
This repository has been archived by the owner on Dec 11, 2023. It is now read-only.

Dynamic RunCommand guarding #4

Merged
merged 8 commits into from
Jan 25, 2022

Conversation

cole-h
Copy link
Member

@cole-h cole-h commented Jan 10, 2022

No description provided.

@@ -94,7 +94,7 @@ sub common {
if (defined $eval->flake) {
my $fl = $eval->flake;
print STDERR "Flake is $fl\n";
if ($eval->flake =~ m!github:([^/]+)/([^/]+)/([[:xdigit:]]{40})$! or $eval->flake =~ m!git\+ssh://git\@github.com/([^/]+)/([^/]+)\?.*rev=([[:xdigit:]]{40})$!) {
if ($eval->flake =~ m!github:([^/]+)/([^/]+)/([[:xdigit:]]{40})$! or $eval->flake =~ m!git\+ssh://git\@github[^/]+/([^/]+)/([^/]+)\?.*rev=([[:xdigit:]]{40})$!) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should break this matcher out in to a separate function, write some tests for it, and upstream it separately. The bit about doCheck should be dropped.

#2 (comment)

Comment on lines +264 to +271
my $enable_dynamic_run_command = defined $c->stash->{params}->{enable_dynamic_run_command} ? 1 : 0;
if ($enable_dynamic_run_command
&& !($c->config->{dynamicruncommand}->{enable}
&& $jobset->project->enable_dynamic_run_command))
{
badRequest($c, "Dynamic RunCommand is not enabled by the server or the parent project.");
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the right thing here is, but this pattern of checking has been peppered around. Instead we should probably have a single function which implements "is dynamic run command enabled?" and use that everywhere.

#2 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this, I had been experimenting with passing in the project ref itself and getting the enable_dynamic_run_command value from there (if the project was defined). However, the two places where a project is used (AddBuilds.pm and Jobsets.pm) had two different "definitions" of a project -- in AddBuilds, it was just a plain Perl hash, but in Projects, it was a DBIx thing.

This is an issue because you can only access enable_dynamic_run_command on a hash with $project->{enable_dynamic_run_command}, but for a DBIx thing, you must use $project->enable_dynamic_run_command (note the lack of curly braces).

One workaround I attempted was to get a hash from the $project DBIx object via {$project->get_columns}, but that looks pretty ugly and would be difficult to intuit if this were to ever happen elsewhere.

So then the only real way to factor this function out into a common location would be to make it take 2 arguments: whether the request wants it to be enabled (e.g. a declarative jobset or a API POST), and if it's enabled on the parent project. For projects themselves, this latter argument would be undef (and we'd check for that), and then we'd also check if it's enabled in the Hydra config before returning 0 if it was disabled somewhere with higher precedence, or 1 if everything checks out.

I think this is the best way to go about it at present, but am open to other ideas.

@cole-h cole-h mentioned this pull request Jan 10, 2022
12 tasks
@grahamc grahamc force-pushed the runcommand/dynamic branch 2 times, most recently from 43ea4cb to 34be56d Compare January 11, 2022 15:52
@grahamc grahamc force-pushed the runcommand/dynamic branch from 34be56d to 38fedea Compare January 24, 2022 19:21
@grahamc grahamc force-pushed the runcommand/dynamic-guarding branch from 7f5f772 to 2f0d8ff Compare January 24, 2022 21:18
Comment on lines +6 to +8
require Catalyst::Test;
use HTTP::Request::Common qw(POST PUT GET DELETE);
use JSON::MaybeXS qw(decode_json encode_json);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most of these are not needed.

@grahamc
Copy link
Member

grahamc commented Jan 24, 2022

This PR looks great. I think the last commit needs to be popped off and put in to a different PR, but this looks solid. The various places we check for server / project / jobset support is a bit ugly, but somewhat inescapable and at least we have fairly thorough tests covering them. Nicely done.

@grahamc grahamc force-pushed the runcommand/dynamic-guarding branch from 2f0d8ff to ca44bb7 Compare January 25, 2022 14:49
@grahamc grahamc merged commit 64952e7 into runcommand/dynamic Jan 25, 2022
@grahamc grahamc deleted the runcommand/dynamic-guarding branch January 25, 2022 15:34
@grahamc grahamc restored the runcommand/dynamic-guarding branch February 11, 2022 19:34
cole-h pushed a commit that referenced this pull request Apr 7, 2022
On hydra.nixos.org the queue runner had child processes that were
stuck handling an exception:

  Thread 1 (Thread 0x7f501f7fe640 (LWP 1413473) "bld~v54h5zkhmb3"):
  #0  futex_wait (private=0, expected=2, futex_word=0x7f50c27969b0 <_rtld_local+2480>) at ../sysdeps/nptl/futex-internal.h:146
  #1  __lll_lock_wait (futex=0x7f50c27969b0 <_rtld_local+2480>, private=0) at lowlevellock.c:52
  #2  0x00007f50c21eaee4 in __GI___pthread_mutex_lock (mutex=0x7f50c27969b0 <_rtld_local+2480>) at ../nptl/pthread_mutex_lock.c:115
  #3  0x00007f50c1854bef in __GI___dl_iterate_phdr (callback=0x7f50c190c020 <_Unwind_IteratePhdrCallback>, data=0x7f501f7fb040) at dl-iteratephdr.c:40
  #4  0x00007f50c190d2d1 in _Unwind_Find_FDE () from /nix/store/65hafbsx91127farbmyyv4r5ifgjdg43-glibc-2.33-117/lib/libgcc_s.so.1
  #5  0x00007f50c19099b3 in uw_frame_state_for () from /nix/store/65hafbsx91127farbmyyv4r5ifgjdg43-glibc-2.33-117/lib/libgcc_s.so.1
  #6  0x00007f50c190ab90 in uw_init_context_1 () from /nix/store/65hafbsx91127farbmyyv4r5ifgjdg43-glibc-2.33-117/lib/libgcc_s.so.1
  #7  0x00007f50c190b08e in _Unwind_RaiseException () from /nix/store/65hafbsx91127farbmyyv4r5ifgjdg43-glibc-2.33-117/lib/libgcc_s.so.1
  #8  0x00007f50c1b02ab7 in __cxa_throw () from /nix/store/dd8swlwhpdhn6bv219562vyxhi8278hs-gcc-10.3.0-lib/lib/libstdc++.so.6
  #9  0x00007f50c1d01abe in nix::parseURL (url="[email protected]") at src/libutil/url.cc:53
  NixOS#10 0x0000000000484f55 in extraStoreArgs (machine="[email protected]") at build-remote.cc:35
  NixOS#11 operator() (__closure=0x7f4fe9fe0420) at build-remote.cc:79
  ...

Maybe the fork happened while another thread was holding some global
stack unwinding lock
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71744). Anyway, since
the hanging child inherits all file descriptors to SSH clients,
shutting down remote builds (via 'child.to = -1' in
State::buildRemote()) doesn't work and 'child.pid.wait()' hangs
forever.

So let's not do any significant work between fork and exec.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants