-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regression: Windows jobs consistently freezing after git clean, before build script #203
Comments
https://scala-ci.typesafe.com/computer/jenkins-worker-windows-publish/script println "jps -v".execute().text
println "jstack 2328".execute().text Threads galore stuck in:
|
The most similar stack trace I've found on the interwebs is http://pastebin.com/m6mtGbNW posted in the last day. (Was that you, @SethTisue ?) |
https://scala-ci.typesafe.com/threadDump is appears to be easier way to get thread dumps for the entire Jenkins Cluster. |
Maybe related: https://issues.jenkins-ci.org/browse/JENKINS-16070
|
Even closer: https://issues.jenkins-ci.org/browse/JENKINS-39179
|
We could try disabling the cygpath plugin, which seems to form part of a static initializer based deadlock or race condition with the Another workarond would be to patch the cygpath plugin not to use JNA to lookup the location of
|
How come this just started failing? Could we give wiping the workspace a shot |
beats the hell out of me, I don't even have a guess
yeah, I should have mentioned, one of the first things I did was |
The jenkins slave needs a restart for sure, it won't come good after the JNA race condition has been triggered. From the analysis on those tickets, the problem is a race on startup between different threads initializing JNA. Two spots that do that are the |
One workaround suggested on the ticket is to:
This assumes a recent enough version of Jenkins (2.4+.) That claims to do |
I moved all workspace folders under |
another freeze last night: https://scala-ci.typesafe.com/job/scala-2.12.x-release-package-windows/432/consoleFull |
another one just now: https://scala-ci.typesafe.com/job/scala-2.11.x-release-package-windows/690/consoleFull |
I've noted our woes in https://issues.jenkins-ci.org/browse/JENKINS-39179 and asked for help in configuring a system property on the slave, which is a suggested fix. |
@SethTisue I found the spot to add the JVM option. Should we just manually add |
I've manually added this to the config via the Jenkins UI. If we have a run of successful builds, we should burn that into the Chef recipe for the master. |
hopefully fixes scala#203, see remarks there for details
after a nice long period of quiet, this came up again. all jobs on jenkins-worker-windows-publish were affected Jason's manual change got lost (presumably a while ago), but I'll PR it now |
hopefully fixes scala#203, see remarks there for details
hopefully fixes scala#203, see remarks there for details
happened again today, all Windows jobs affected, and blowing away |
example job that hung on |
https://scala-ci.typesafe.com/threadDump is useful for determining whether the thread dumps showed lots of threads hanging around monitoring swap space. Adriaan found this suspicious, disabled swap space monitoring (at https://scala-ci.typesafe.com/computer/configure), and restarted the node. everything seems to be working now (and all those threads don't show anymore) |
if this comes back, we should first try blowing away the workspace directory and restarting the node. if that doesn't do it, we should look at the swap space monitoring setting and see whether Adriaan's change needs to be re-applied. if it does, and if it that fixes it, we should PR the change |
oh, for the record, Adriaan also upgraded Git from 2.11 to 2.14, when we still thought it might actually be freezing during |
today: all Windows jobs were freezing. deleting the workspaces didn't help. the swap space monitoring setting was still off, but there were a lot of swap space monitoring threads hanging around (though we still don't know if that's even a factor here). rebooting the node (by ssh-ing and doing |
happening again. clearly not fixed. I'll comment further if I ever find that ssh-ing to the Windows worker and doing |
AFAIR, disabling swap space monitoring doesn't actually disable the background threads that start JNA, which race with the cygwin path plugin's use of JNA. |
There has been some activity on the Jenkins ticket, though sadly not to resolution:
|
haven't seen this in a while, and should be superseded by scala/scala-dev#508 anyway (moving to AppVeyor for Windows testing; Windows publishing was already moved) |
e.g. https://scala-ci.typesafe.com/job/scala-2.13.x-integrate-windows/29/consoleFull
I had hoped this was just an intermittent thing, but for several days now all Windows jobs are affected.
I don't think this is the same as #143, but who knows. I did check that longpaths is set to true in user jenkins' global gitconfig
@adriaanm any ideas? I'm stumped. we won't able to release 2.12.1 without finding some solution or workaround, as the -package- jobs hang too, see e.g. the string of failures at https://scala-ci.typesafe.com/view/scala-2.11.x/job/scala-2.12.x-release-package-windows/
The text was updated successfully, but these errors were encountered: