-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for WFCORE-7097 and Fix for WFCORE-7098 #6283
base: main
Are you sure you want to change the base?
Conversation
@jamezp Please review as you're kind of an SME on this. @jfdenise @yersan @jamezp I put the 27.x label on this mostly to get your attention so you can think whether this needs to be in WF 35 or not. I suspect the only urgency around this is the intermittent failure WFCORE-7097 mentions, and then the bootable jar failures we are seeing in full WF in ts/int/elytron-oidc-client. But those don't force us to do something quickly if we don't think that's the right thing to do; those both may have workarounds. |
bootable-jar/runtime/src/main/java/org/wildfly/core/jar/runtime/InstallationCleaner.java
Outdated
Show resolved
Hide resolved
if (Files.notExists(cleanupMarker)) { | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow this. Can you explain why we were seeing an issue if the file does not exist here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case, the process is started but the cleanup already occurred (a timeout + a previous process running).
} | ||
// Do a last cleanup, in case the cleanupMarker still exists (could have been deleted by running process). | ||
if (Files.exists(cleanupMarker)) { | ||
cleanup(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could potentially launch another process. We should probably just invoke the deleteDirectory()
at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is done in purpose. On Windows we need the the external process. The cleanup waits until the process terminate (with a timeout).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea that if the process is running while the other the bootable JAR process is still running that we terminate that process and start a new one? I'm just a little confused on what we gain here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is in case, the previous process didn't complete the deletion for some reason (timeout and process forcibly terminated from the caller thread), we start a new process to finalize.
Core -> Full Integration Build 14432 outcome was UNKNOWN using a merge of a7eede0 |
Core -> Full Integration Build 14131 outcome was UNKNOWN using a merge of a7eede0 |
Core -> WildFly Preview Integration Build 14213 outcome was UNKNOWN using a merge of a7eede0 |
a7eede0
to
5e6e784
Compare
Core -> Full Integration Build 14437 outcome was UNKNOWN using a merge of 5e6e784 |
Core -> Full Integration Build 14136 outcome was UNKNOWN using a merge of 5e6e784 |
Core -> Full Integration Build 14137 outcome was FAILURE using a merge of 5e6e784 Failed tests
|
This comment was marked as off-topic.
This comment was marked as off-topic.
Core -> WildFly Preview Integration Build 14220 outcome was UNKNOWN using a merge of 5e6e784 |
This comment was marked as off-topic.
This comment was marked as off-topic.
@@ -45,28 +46,13 @@ class InstallationCleaner implements Runnable { | |||
} | |||
|
|||
@Override | |||
public void run() { | |||
public synchronized void run() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am missing something; this task is submitted by a SingleThreadExecutor
in the BootableJar shutdown hook.
If we are marking it as synchronized, it only means that we could have more than one Bootable Jar instance for the same server home, right?
If that's true, the same marked file is also shared as a marker across the multiple Bootable JAR instances launched from the same home, wouldn't that be an issue after all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this task is submitted by a SingleThreadExecutor in the BootableJar shutdown hook.
If we are marking it as synchronized, it only means that we could have more than one Bootable Jar instance for the same server home, right?
Well, even in that case, we are creating new instances of this InstallationCleaner on each shuftdown hook, so I don't get why the synchronized is required (or nice to have ) at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yersan , we could have a timeout on the calling thread. The task (running in its own thread) is not yet done, and the calling thread will attempt to do a cleanup again. We need to synchronize at this point to avoid multiple cleanup in //. synchronize enforces it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jfdenise ok, so it is not to allow dealing with muiltiple Bootable JARs from the same server home.
Ok, in that case, should not be InstallationCleaner.cleanup() method be the one that should be synchronized?
That's the method in common with the InstallationCleaner.run()
and InstallationCleaner.cleanupTimeout()
which are the entry points for the submitted task and the explicit cleaner.cleanupTimeout()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yersan The key piece is: Files.notExists(cleanupMarker)
We need to have all threads to share a common view on it. So all entry points to it should be synchronized.
FYI, I am constantly re-running the 2 bootable JAR jobs, my goal is to run 10 times on the 2 platforms with no issues. |
@yersan , 10 green runs on each platform. I will stop testing. |
@jamezp Can you review again? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm approving, but I think we might need to revisit this. I don't want to block fixing CI though.
I don't think this necessarily wrong, I think we're just overly using resources. We have a shutdown hook which attempts to cleanup the resources. On Windows these typically can't be cleaned up until the server process is ended. However, we attempt to wait for the process to end, then we launch another process as a final clean up. I think we could stream line this a bit, but I'd need to think it through a little more.
@@ -153,7 +159,9 @@ private void newProcess() throws IOException { | |||
.redirectError(ProcessBuilder.Redirect.INHERIT) | |||
.redirectOutput(ProcessBuilder.Redirect.INHERIT) | |||
.directory(new File(System.getProperty("user.dir"))); | |||
builder.start(); | |||
process = builder.start(); | |||
process.waitFor(environment.getTimeout(), TimeUnit.SECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also sounds a bit inappropriate, since this method could be invoked directly from the shutdown hook thread. The shutdown hook API says that is inadvisable to attempt any user interaction or to perform a long-running computation in a shutdown hook.
I guess my question would be if this is somehow killed completely by the JVM, if there could be possibilities of having the started process running around.
In any case, if decided, we can move on and see how it behaves in the CI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yersan , I was thinking to it more, and I think that we shouldn't merge it. Although I am confident on the Linux fix, it requires more on Windows front.
5e6e784
to
b4410ea
Compare
…ssionsDeploymentTestCase.testWithConfiguredMaxBootThreads
… module when ts.bootable
b4410ea
to
913f86c
Compare
@jamezp , When testing with a lot of corner cases (with complex scheduling scenarii) on Windows, I came to the conclusion that we need, in the forked process, to wait for the server process to terminate prior to delete the installation. That is the only way to ensure that nothing is left behind and the installation is actually deleted. |
Core -> Full Integration Build 14158 outcome was FAILURE using a merge of 913f86c Failed tests
|
Core -> Full Integration Build 14159 outcome was FAILURE using a merge of 913f86c Failed tests
|
Core -> WildFly Preview Integration Build 14241 outcome was FAILURE using a merge of 913f86c Failed tests
|
Core -> WildFly Preview Integration Build 14242 outcome was FAILURE using a merge of 913f86c Failed tests
|
@jfdenise Yes. That is what it's supposed to be doing currently. I guess I should read the Jira's to see what problem we're trying to solve. One thing I'm not sure why I'd originally done, is create an |
Looking at the The |
ISSUES: