Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jenkins 31084 graceful shutdown #115

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions client/src/main/java/hudson/plugins/swarm/Client.java
Original file line number Diff line number Diff line change
Expand Up @@ -145,10 +145,17 @@ public void run(SwarmClient swarmClient, String... args) throws InterruptedExcep
// The Jenkins that we are trying to connect to.
Candidate target;

ShutdownHook shutdownHook = null;

// wait until we get the ACK back
int retry = 0;
while (true) {
try {
if (shutdownHook != null) {
Runtime.getRuntime().removeShutdownHook(shutdownHook);
shutdownHook = null;
}

if (options.master == null) {
logger.info("No Jenkins master supplied on command line, performing auto-discovery");
target = swarmClient.discoverFromBroadcast();
Expand Down Expand Up @@ -190,7 +197,13 @@ public void run(SwarmClient swarmClient, String... args) throws InterruptedExcep
labelFileWatcherThread.start();
}

/*
* Set up a shutdown hook so we disconnect gracefully
*/
shutdownHook = new ShutdownHook(swarmClient, target);
Runtime.getRuntime().addShutdownHook(shutdownHook);
swarmClient.connect(target);

if (options.noRetryAfterConnected) {
logger.warning("Connection closed, exiting...");
swarmClient.exitWithStatus(0);
Expand Down Expand Up @@ -285,4 +298,26 @@ private static boolean isDefaultOption(String key, Object value, CmdLineParser d
}
return false;
}


public static class ShutdownHook extends Thread {
private SwarmClient client;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] These can be private final.

private Candidate target;

public ShutdownHook(SwarmClient client, Candidate target) {
this.client = client;
this.target = target;
}

@Override
public void run() {
logger.warning("Client shutdown initiated");
try {
client.takeSlaveOfflineAndWait(target, "JVM is shutting down");
} catch (Exception e) {
logger.log(Level.SEVERE, "Unable to perform graceful slave shutdown: ", e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if you started a shutdown operation (e.g., by sending a SIGINT with Ctrl-C or a SIGTERM with kill(1)) and then sent another signal (e.g., another SIGINT by pressing Ctrl-C again) while we were in the middle of takeSlaveOfflineAndWait()? I think the expected behavior is that we would stop immediately without continuing to wait in takeSlaveOfflineAndWait(). I think that your current code probably already handles this in that an InterruptedException would be thrown (and you are catching Exception here). It might be worth validating this with a manual test. Even if the code works as expected, might it be worth printing an additional message in this case (e.g., "terminating with extreme prejudice")?

}
}
}

}
89 changes: 89 additions & 0 deletions client/src/main/java/hudson/plugins/swarm/SwarmClient.java
Original file line number Diff line number Diff line change
Expand Up @@ -521,6 +521,57 @@ protected void createSwarmSlave(Candidate target) throws IOException, RetryExcep
}
}

protected static synchronized void postSlaveOffline(
String name,
CloseableHttpClient client,
HttpClientContext context,
Candidate target,
String reason)
throws IOException, RetryException {
HttpPost post =
new HttpPost(
target.url
+ "plugin/swarm/markSlaveOffline?name="
+ name
+ "&secret="
+ target.secret
+ SwarmClient.param("reason", reason));

post.addHeader("Connection", "close");

Crumb csrfCrumb = SwarmClient.getCsrfCrumb(client, context, target);
if (csrfCrumb != null) {
post.addHeader(csrfCrumb.crumbRequestField, csrfCrumb.crumb);
}

try (CloseableHttpResponse response = client.execute(post, context)) {
if (response.getStatusLine().getStatusCode() != HttpStatus.SC_OK) {
String msg =
String.format(
"Failed to mark slave offline. %s - %s",
response.getStatusLine().getStatusCode(),
EntityUtils.toString(response.getEntity(), UTF_8));
logger.log(Level.SEVERE, msg);
throw new RetryException(msg);
}
}
}

public void takeSlaveOfflineAndWait(Candidate target, String reason) throws IOException, RetryException, InterruptedException {
logger.log(Level.WARNING, "WARNING: Taking this slave instance down due to: " + reason);
URL urlForAuth = new URL(target.getURL());
CloseableHttpClient h = createHttpClient(urlForAuth);
HttpClientContext context = createHttpClientContext(urlForAuth);

postSlaveOffline(name, h, context, target, reason);

while(!getSlaveReadyForShutdown(name, h, context, target)) {
logger.info("Slave not ready for shutdown, waiting");
Thread.sleep(60000); //TODO: configure this and the total duration
}
}


protected static synchronized void postLabelRemove(
String name,
String labels,
Expand Down Expand Up @@ -557,6 +608,43 @@ protected static synchronized void postLabelRemove(
}
}

private static synchronized boolean getSlaveReadyForShutdown(String name,
CloseableHttpClient client,
HttpClientContext context,
Candidate target) throws IOException, RetryException {
HttpGet get =
new HttpGet(
target.url
+ "plugin/swarm/getSlaveReadyForShutdown?name="
+ name);

get.addHeader("Connection", "close");

boolean result = false;
try (CloseableHttpResponse response = client.execute(get, context)) {
if (response.getStatusLine().getStatusCode() != HttpStatus.SC_OK) {
String msg =
String.format(
"Failed to check slave ready for shutdown. %s - %s",
response.getStatusLine().getStatusCode(),
EntityUtils.toString(response.getEntity(), UTF_8));
logger.log(Level.SEVERE, msg);
throw new RetryException(msg);
}
Document xml;
try {
xml = XmlUtils.parse(response.getEntity().getContent());
} catch (SAXException e) {
String msg = "Invalid XML received from getSlaveReadyForShutdown";
logger.log(Level.SEVERE, msg, e);
throw new RetryException(msg);
}
String statusString = getChildElementString(xml.getDocumentElement(), "readyStatus");
result = Boolean.valueOf(statusString);
}
return result;
}

protected static synchronized void postLabelAppend(
String name,
String labels,
Expand Down Expand Up @@ -773,4 +861,5 @@ protected static class Crumb {
}
}


}
48 changes: 43 additions & 5 deletions plugin/src/main/java/hudson/plugins/swarm/PluginImpl.java
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,10 @@
import hudson.model.Computer;
import hudson.model.Descriptor.FormException;
import hudson.model.Node;
import hudson.model.User;
import hudson.slaves.EnvironmentVariablesNodeProperty;
import hudson.slaves.NodeProperty;
import hudson.slaves.OfflineCause;
import hudson.slaves.SlaveComputer;
import hudson.tools.ToolDescriptor;
import hudson.tools.ToolInstallation;
Expand Down Expand Up @@ -73,14 +75,18 @@ public void doGetSlaveLabels(StaplerRequest req, StaplerResponse rsp, @QueryPara
return;
}

normalResponse(req, rsp, nn.getLabelString());
normalLabelResponse(req, rsp, nn.getLabelString());
}

private void normalResponse(StaplerRequest req, StaplerResponse rsp, String sLabelList) throws IOException {
private static void normalLabelResponse(StaplerRequest req, StaplerResponse rsp, String sLabelList) throws IOException {
normalResponse(req, rsp, "<labelResponse><labels>" + sLabelList + "</labels></labelResponse>");
}

private static void normalResponse(StaplerRequest req, StaplerResponse rsp, String rawXml) throws IOException {
rsp.setContentType("text/xml");

Writer w = rsp.getCompressedWriter(req);
w.write("<labelResponse><labels>" + sLabelList + "</labels></labelResponse>");
w.write(rawXml);
w.close();
}

Expand All @@ -106,7 +112,7 @@ public void doAddSlaveLabels(StaplerRequest req, StaplerResponse rsp, @QueryPara
nn.setLabelString(setToString(hs));
nn.getAssignedLabels();

normalResponse(req, rsp, nn.getLabelString());
normalLabelResponse(req, rsp, nn.getLabelString());
}

private static String setToString(Set<String> labels) {
Expand Down Expand Up @@ -134,7 +140,7 @@ public void doRemoveSlaveLabels(StaplerRequest req, StaplerResponse rsp, @QueryP
hs.removeAll(lBadLabels);
nn.setLabelString(setToString(hs));
nn.getAssignedLabels();
normalResponse(req, rsp, nn.getLabelString());
normalLabelResponse(req, rsp, nn.getLabelString());
}

/**
Expand Down Expand Up @@ -221,6 +227,38 @@ public void doCreateSlave(StaplerRequest req, StaplerResponse rsp, @QueryParamet
}
}

/**
* Allows the client to mark a slave offline
*/
public void doMarkSlaveOffline(StaplerRequest req, StaplerResponse rsp, @QueryParameter String name,
@QueryParameter String reason, @QueryParameter String secret) throws IOException {
if (!getSwarmSecret().equals(secret)) {
rsp.setStatus(SC_FORBIDDEN);
return;
}

Jenkins jenkins = Jenkins.getInstance();

jenkins.checkPermission(SlaveComputer.DISCONNECT);

Node node = getNodeByName(name, rsp);

node.toComputer().setTemporarilyOffline(true, new OfflineCause.UserCause(User.current(), reason));

normalResponse(req, rsp, "<response/>"); //TODO: not sure what to send here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure either offhand. Assuming that the default response status is javax.servlet.http.HttpServletResponse.SC_OK, can we rely on the response status alone and simply avoid calling rsp.getCompressedWriter() in the first place for such responses? I can look into this further if you're having trouble.

}

public void doGetSlaveReadyForShutdown(StaplerRequest req, StaplerResponse rsp, @QueryParameter String name) throws IOException {
Jenkins jenkins = Jenkins.getInstance();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding checking for secrets/permissions: I can't see any disadvantages to checking, and I can't see any advantages to not checking. In contrast, if we don't check, we increase the attack surface for potential security issues. Based on the above reasoning, I think I'd prefer to check secrets/permissions.


Node node = getNodeByName(name, rsp);

boolean ready = node.toComputer().isOffline() && node.toComputer().isIdle();

normalResponse(req, rsp, "<slaveInfo><readyStatus>" + ready + "</readyStatus></slaveInfo>");
}


private static List<ToolLocation> parseToolLocations(String[] toolLocations) {
List<ToolLocationNodeProperty.ToolLocation> result = new ArrayList<>();

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@
import java.util.Set;
import org.apache.commons.lang.RandomStringUtils;
import org.apache.commons.lang.math.NumberUtils;
import org.jenkinsci.plugins.workflow.cps.CpsFlowDefinition;
import org.jenkinsci.plugins.workflow.job.WorkflowJob;
import org.jenkinsci.plugins.workflow.job.WorkflowRun;
import org.junit.After;
import org.junit.Assume;
import org.junit.ClassRule;
Expand Down Expand Up @@ -268,6 +271,30 @@ public void pidFileDeletedOnExit() throws Exception {
assertFalse("PID file removed", pidFile.exists());
}


@Test
public void gracefulShutdownOnKill() throws Exception {
TestUtils.SwarmClientProcessWrapper node =
TestUtils.runSwarmClient(
"agentDeletePid",
j,
processDestroyer,
temporaryFolder);

TestUtils.waitForNode("agentDeletePid", j);


// Now run a job that will hold up the node.
WorkflowJob project = j.createProject(WorkflowJob.class);
project.setDefinition(new CpsFlowDefinition("sleep(10)", true));
WorkflowRun build = project.scheduleBuild2(0).waitForStart();

node.process.destroy();
node.process.waitFor();

j.assertBuildStatusSuccess(j.waitForCompletion(build));
}

/**
* @return a dedicated unique PID file object for an as yet non-existent file.
* @throws IOException
Expand Down