Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRIU Pingperf adds test loop for SharedClasses validation #5541

Open
LongyuZhang opened this issue Sep 3, 2024 · 13 comments
Open

CRIU Pingperf adds test loop for SharedClasses validation #5541

LongyuZhang opened this issue Sep 3, 2024 · 13 comments

Comments

@LongyuZhang
Copy link
Contributor

Based on Issue eclipse-openj9/openj9#20012, OpenLiberty utilizes the established shared classes for multiple servers, so we need to increase Pingperf test to loop this test several times inside the container to validate the built shared classes.
FYI @tajila @llxia

@LongyuZhang
Copy link
Contributor Author

Tested creating Pingperf checkpoint images with 0.46 release (grinder link) ,

  • docker-na.artifactory.swg-devops.com/sys-rt-docker-local/grinder/pingperf_17-openj9-ubi-9-linux_ppc-64-sw.os.rhel.8-hw.arch.ppc64le.p10:42970
  • docker-na.artifactory.swg-devops.com/sys-rt-docker-local/grinder/pingperf_17-openj9-ubi-8-linux_ppc-64-sw.os.rhel.8-hw.arch.ppc64le.p10:42969

then run these images inside podman container, with following commands multiple times

  • /opt/ol/wlp/bin/server start
  • /opt/ol/wlp/bin/server stop
  • /opt/ol/wlp/bin/server run defaultServer

Output is:

sh-4.4$ /opt/ol/wlp/bin/server start
Starting server defaultServer.
CWWKE0953W: This version of Open Liberty is an unsupported early release version.
Server defaultServer started with process ID 1026.
sh-4.4$ /opt/ol/wlp/bin/server stop
Stopping server defaultServer.
Server defaultServer stopped.
sh-4.4$ /opt/ol/wlp/bin/server run defaultServer
[AUDIT   ] Launching defaultServer (Open Liberty 24.0.0.9-beta/wlp-1.0.92.cl240820240729-1903) on Eclipse OpenJ9 VM, version 17.0.12+7 (en_US)
[AUDIT   ] CWWKT0016I: Web application available (default_host): http://363c9c36ea31:9080/pingperf/
[AUDIT   ] CWWKC0452I: The Liberty server process resumed operation from a checkpoint in 0.061 seconds.
[AUDIT   ] CWWKZ0001I: Application pingperf started in 0.062 seconds.
[AUDIT   ] CWWKF0012I: The server installed the following features: [cdi-3.0, concurrent-2.0, jndi-1.0, jsonp-2.0, restfulWS-3.0, restfulWSClient-3.0, servlet-5.0].
[AUDIT   ] CWWKF0011I: The defaultServer server is ready to run a smarter planet. The defaultServer server started in 0.067 seconds.
^C[AUDIT   ] CWWKE0085I: The server defaultServer is stopping because the JVM is exiting.
[AUDIT   ] CWWKE1100I: Waiting for up to 30 seconds for the server to quiesce.
[AUDIT   ] CWWKT0017I: Web application removed (default_host): http://363c9c36ea31:9080/pingperf/
[AUDIT   ] CWWKZ0009I: The application pingperf has stopped successfully.

Not able to reproduce the error. Wondering what extra tests we need to run to trigger the SCC?

@tajila
Copy link
Contributor

tajila commented Sep 4, 2024

How many iterations did you run?

@tajila
Copy link
Contributor

tajila commented Sep 4, 2024

@tjwatson FYI

@LongyuZhang
Copy link
Contributor Author

LongyuZhang commented Sep 4, 2024

How many iterations did you run?

Around 10 iterations, I can increase to 50 to have a try.
Tried iteration of start and stop 50 times, the same.

@tajila
Copy link
Contributor

tajila commented Sep 4, 2024

Did you have a link to the dockre files that you are using for the test?

@LongyuZhang
Copy link
Contributor Author

@tajila
Copy link
Contributor

tajila commented Sep 4, 2024

@tjwatson Do you know what we are doing differently from Liberty testing?

@tjwatson
Copy link

tjwatson commented Sep 5, 2024

@tjwatson Do you know what we are doing differently from Liberty testing?

Our automated testing does not use container images. Instead it starts and stops various servers that will be using the same shared classes cache. But we have various other reports of the scripts used to build an application image also failing. Like the configure.sh script which starts and stops the server many times.

@llxia
Copy link
Contributor

llxia commented Sep 5, 2024

@tjwatson Could you point us to the automated test that identified this issue? We’re interested in exploring the possibility of incorporating it into our testing pipeline to catch such issues earlier.

@llxia
Copy link
Contributor

llxia commented Sep 17, 2024

@tjwatson could you provide us some more info? Thanks

@LongyuZhang
Copy link
Contributor Author

Discussed with @hangshao0 , created the case to run 10 liberty servers simultaneously with a shared class cache, still didn't reproduce the error.
Following are the steps tried:

  1. Set jdk_x64_linux_21.0.4_7_openj9-0.46.0 as the default java on rhel9 xlinux machine, installed apache-maven-3.9.9.
  2. git clone https://github.com/openliberty/guide-getting-started.git, then add jvm options to guide-getting-started/start/target/liberty/wlp/usr/shared/jvm.options
-Xshareclasses:name=openj9_system_scc,cacheDir=/opt/java/.scc 
-Xscmx300m
  1. Duplicate this folder 10 times, and change each server port in start/pom.xml to avoid conflict.
  2. Start the 10 servers together (tried several times), then monitor the scc stats, did see 10 jvms writing/reading to the scc simultaneously, but didn't reproduce the failure.

@tjwatson
Copy link

on rhel9 xlinux machine

We saw this on Ubuntu 22.04 x86-64 machines. I doubt that is the difference, but thought I would point it out.

@LongyuZhang
Copy link
Contributor Author

We saw this on Ubuntu 22.04 x86-64 machines. I doubt that is the difference, but thought I would point it out.

Thanks for the info! Tested on ubuntu 22 xlinux machine with same steps as #5541 (comment) , still didn't reproduce the failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

4 participants