Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8340401: DcmdMBeanPermissionsTest.java and SystemDumpMapTest.java fail with assert(_stack_base != nullptr) failed: Sanity check #21091

Closed
wants to merge 5 commits into from

Conversation

stooke
Copy link
Contributor

@stooke stooke commented Sep 19, 2024

This PR addresses JDK-8340401 by adding a test for a thread with no stack currently assigned.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8340401: DcmdMBeanPermissionsTest.java and SystemDumpMapTest.java fail with assert(_stack_base != nullptr) failed: Sanity check (Bug - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/21091/head:pull/21091
$ git checkout pull/21091

Update a local copy of the PR:
$ git checkout pull/21091
$ git pull https://git.openjdk.org/jdk.git pull/21091/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 21091

View PR using the GUI difftool:
$ git pr show -t 21091

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/21091.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 19, 2024

👋 Welcome back stooke! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Sep 19, 2024

@stooke This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8340401: DcmdMBeanPermissionsTest.java and SystemDumpMapTest.java fail with assert(_stack_base != nullptr) failed: Sanity check

Reviewed-by: dholmes, stuefe, kevinw

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 69 new commits pushed to the master branch:

  • 8efc558: 8346378: Cannot use DllMain in libnet for static builds
  • 73b5dba: 8345655: Move reservation code out of ReservedSpace
  • d50b725: 8344647: Make java.se participate in the preview language feature requires transitive java.base
  • 9e8aa85: 8346017: Socket.connect specified to throw UHE for unresolved address is problematic for SOCKS V5 proxy
  • 5b703c7: 8342782: AWTEventMulticaster throws StackOverflowError using AquaButtonUI
  • edbd76c: 8344951: Stabilize write barrier micro-benchmarks
  • 842f801: 8339331: GCC fortify error in vm_version_linux_aarch64.cpp
  • 4533109: 8345911: Enhance error message when IncompatibleClassChangeError is thrown for sealed class loading failures
  • ea50c54: 8321818: vmTestbase/nsk/stress/strace/strace015.java failed with 'Cannot read the array length because "" is null'
  • c0f0b8e: 8346151: Add transformer error logging to VerifyLocalVariableTableOnRetransformTest
  • ... and 59 more: https://git.openjdk.org/jdk/compare/18e0b343ab002b193d4db3be226394ee7dbe9f19...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@tstuefe, @kevinjwalls, @dholmes-ora) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk
Copy link

openjdk bot commented Sep 19, 2024

@stooke The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@stooke stooke changed the title DcmdMBeanPermissionsTest.java and SystemDumpMapTest.java fail with assert(_stack_base != nullptr) failed: Sanity check 8340401: DcmdMBeanPermissionsTest.java and SystemDumpMapTest.java fail with assert(_stack_base != nullptr) failed: Sanity check Sep 19, 2024
@kevinjwalls
Copy link
Contributor

Would be good to go ahead with this.
It seems on Windows that JavaThread with undefined stack base/size can be seen via JavaThreadIteratorWithHandle.
There may be more to investigate as to why this is happening, but guarding against it as a quick fix seems good as the failure is quite noisy.
I'm testing it too, will let you know if anything interesting happens.

@kevinjwalls
Copy link
Contributor

Nothing interesting happened. Meaning, no crashes!

@stooke stooke marked this pull request as ready for review September 20, 2024 07:49
@openjdk openjdk bot added the rfr Pull request is ready for review label Sep 20, 2024
@mlbridge
Copy link

mlbridge bot commented Sep 20, 2024

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is good enough. The setting of stack size can race the setting of stack_base - you could observe stack size set, but stack base still unset.

What I would do instead is:

give us a new function in Thread that, by contract, does not assert but returns NULL (similar to e.g. Thread::curren_or_null()):

address stack_base_or_null() const { return _stack_base; }

use this in vma_touches_thread_stack to skip NULL base.

Alternatively, in record_stack_base_and_size, we need a store store barrier between the two writes, and store stack size first.

@tstuefe
Copy link
Member

tstuefe commented Sep 20, 2024

Would be good to go ahead with this. It seems on Windows that JavaThread with undefined stack base/size can be seen via JavaThreadIteratorWithHandle. There may be more to investigate as to why this is happening, but guarding against it as a quick fix seems good as the failure is quite noisy. I'm testing it too, will let you know if anything interesting happens.

I think we have understood why this is happening on Windows. See David's comment in the JBS issue.

@dholmes-ora
Copy link
Member

Alternatively, in record_stack_base_and_size, we need a store store barrier between the two writes, and store stack size first.

I guess this could be a real issue on Windows-aarch64. The storestore barrier would work for me. But we probably need an audit of every JavaThread iterator use in case the same problem can occur elsewhere.

@mlbridge
Copy link

mlbridge bot commented Sep 20, 2024

Mailing list message from Thomas Stüfe on hotspot-runtime-dev:

On Fri, Sep 20, 2024 at 12:44?PM David Holmes <dholmes at openjdk.org> wrote:

On Fri, 20 Sep 2024 08:05:20 GMT, Simon Tooke <stooke at openjdk.org> wrote:

I would prefer it here, or in HANDLE_THREAD macro, to make sure we do
this for all threads printed.

I will put it in HANDLE_THREAD to cover the other tests.

It is not needed for the other tests - those threads can't be new.

This is an issue with using the JavaThread iterator so should be visibly
seen in the itertation loop IMO.

I don't think this is limited to JavaThread

Eg in G1:

- parent thread calls WorkerThreads::set_active_workers
- WorkerThreads::set_active_workers creates thread (A), starts thread (B),
sets them into the workers array (C)
- from point (C) on they are observable and WorkerThreads::threads_do will
iterate them
- Child thread starts running at (B), but I don't see any way to guarantee
that initialization is finished before another thread can observe it via
WorkerThreads::threads_do.

Admittedly, this is improbable. But why risk it? We don't lose anything by
just testing printing for every thread.

-------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-runtime-dev/attachments/20240920/cf7d4dbc/attachment.htm>

Copy link
Contributor

@kevinjwalls kevinjwalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this update in a local tier1 linux and windows, and in the CI tier3 and tier5 windows where I'm seeing all passes, lots of ZGC usage (which remains a more likely way to trigger this, but not essential).

Simon if this attracts another reviewer, I would be happy to see this go ahead and calm down the CI noise (every tier5 run turns up a few failures).

Like the other test change, if we want to meditate further on it and follow up with a better solution that's great.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 20, 2024
@tstuefe
Copy link
Member

tstuefe commented Sep 20, 2024

I tested this update in a local tier1 linux and windows, and in the CI tier3 and tier5 windows where I'm seeing all passes, lots of ZGC usage (which remains a more likely way to trigger this, but not essential).

Have you tested with Windows arm, too?

Simon if this attracts another reviewer, I would be happy to see this go ahead and calm down the CI noise (every tier5 run turns up a few failures).

Like the other test change, if we want to meditate further on it and follow up with a better solution that's great.

Can't we just problem-list it for now? I would like that better than adding a solution that we already know won't work for all cases (which would make e.g. hunting the bug tail when backporting more confusing).

@kevinjwalls
Copy link
Contributor

Have you tested with Windows arm, too?

No. I get the theory, but has anybody seen an issue there?
I'm not against problemlisting if another real solution is in the works.

@tstuefe
Copy link
Member

tstuefe commented Sep 20, 2024

Have you tested with Windows arm, too?

No. I get the theory, but has anybody seen an issue there? I'm not against problemlisting if another real solution is in the works.

Yes, Simon is working on a solution. This may just be a case of too-many-reviewer-opinions-paralyses. Can you problemlist the test or should we do it?

@kevinjwalls
Copy link
Contributor

Have you tested with Windows arm, too?

No. I get the theory, but has anybody seen an issue there? I'm not against problemlisting if another real solution is in the works.

Yes, Simon is working on a solution. This may just be a case of too-many-reviewer-opinions-paralyses. Can you problemlist the test or should we do it?

Would be great if you could, thanks.

@tstuefe
Copy link
Member

tstuefe commented Sep 20, 2024

Have you tested with Windows arm, too?

No. I get the theory, but has anybody seen an issue there? I'm not against problemlisting if another real solution is in the works.

Yes, Simon is working on a solution. This may just be a case of too-many-reviewer-opinions-paralyses. Can you problemlist the test or should we do it?

Would be great if you could, thanks.

okay: #21109

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Sep 20, 2024
@dholmes-ora
Copy link
Member

I don't think this is limited to JavaThread

@tstuefe whether other kinds of thread iterators have potential issues with the state of the threads they may observe is not relevant to the question of whether the proposed check goes in the iteration loop or the HANDLE_THREAD macro - we don't process any GC threads with HANDLE_THREAD. The JavaThread iterator can return a thread that has not yet set its stack so the code using the iterator needs to deal with that if it might be an issue.

@dholmes-ora
Copy link
Member

dholmes-ora commented Sep 21, 2024

Yes, Simon is working on a solution.

One solution may be to stop creating the threads in the suspended state and use the same protocol for initialization that is used for non-Windows threads. At least that way we get consistent behaviour. I wouldn't expect Simon to undertake that task though.

@tstuefe
Copy link
Member

tstuefe commented Sep 22, 2024

I don't think this is limited to JavaThread

@tstuefe whether other kinds of thread iterators have potential issues with the state of the threads they may observe is not relevant to the question of whether the proposed check goes in the iteration loop or the HANDLE_THREAD macro - we don't process any GC threads with HANDLE_THREAD. The JavaThread iterator can return a thread that has not yet set its stack so the code using the iterator needs to deal with that if it might be an issue.

We iterate GC threads (

Universe::heap()->gc_threads_do(&cl);
) here; it calls vma_touches_thread_stack, which is why I proposed in my last comment (#21091 (review)) to fix that function. Am I overlooking something?

@dholmes-ora
Copy link
Member

@tstuefe I think this is a thread lifecycle management issue and so anyone using the JavaThread iterator or GCworkersDo needs to be aware of the possibility of a thread having a null stackbase. Having the check as close as possible to the iteration loop (whichever kind) makes this more visible, rather than hiding it inside another function. I would rather vma_touches_thread_stack be guaranteed to get "good threads" than writing it to allow for "bad ones". Of course even better if we simply don't allow this kind of thread "badness" in the first place.

@tstuefe
Copy link
Member

tstuefe commented Sep 23, 2024

@tstuefe I think this is a thread lifecycle management issue and so anyone using the JavaThread iterator or GCworkersDo needs to be aware of the possibility of a thread having a null stackbase. Having the check as close as possible to the iteration loop (whichever kind) makes this more visible, rather than hiding it inside another function. I would rather vma_touches_thread_stack be guaranteed to get "good threads" than writing it to allow for "bad ones". Of course even better if we simply don't allow this kind of thread "badness" in the first place.

Okay, that makes sense considering that people tend to copy paste solutions.

I leave this up to Simon, then (also the question whether he wants to tackle rewriting Windows thread startup in a manner that we prevent this situation altogether). After all, the test now is problemlisted, so the urgency is gone.

@dholmes-ora
Copy link
Member

As I've heard nothing further here, I've filed JDK-8342482 and will work on it.

@dholmes-ora
Copy link
Member

JDK-8342482 didn't work out.

@tstuefe
Copy link
Member

tstuefe commented Dec 12, 2024

@tstuefe we can't call stack_base() because it will throw an assert if it's null. (hence the stack_base_or_null() addition). Would you like me to check stack_size() instead? (my suggestion is to rename stack_base_or_null() to a more generic "has_valid_stack()")

Oh you are right, I forgot that. No, I think stack_base_or_null is fine in that case.

@stooke
Copy link
Contributor Author

stooke commented Dec 12, 2024

This was closed in error.

@stooke stooke reopened this Dec 12, 2024
@openjdk
Copy link

openjdk bot commented Dec 12, 2024

@stooke Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.

@stooke stooke requested a review from tstuefe December 13, 2024 16:32
@stooke
Copy link
Contributor Author

stooke commented Dec 13, 2024

@tstuefe I have implemented Thread::stack_base_or_null(), and I expect the Gitlab gates to pass. On my VM, I have removed the tests from ProblemList files, and they pass.
Two questions:

  1. should I check in a SystemDumpMapZGCTest() ?
  2. should I remove these tests from the ProblemList files? (I notice that DcmdMBeanPermissionsTest was actually deleted in JDK-8338411)

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good.

@dholmes-ora
Copy link
Member

@stooke if there is any test in the ProblemList file that lists 8340401, then the PL needs to be updated. Thanks

@stooke
Copy link
Contributor Author

stooke commented Dec 17, 2024

@stooke if there is any test in the ProblemList file that lists 8340401, then the PL needs to be updated. Thanks

@dholmes-ora , done.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updates

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Dec 18, 2024
@stooke
Copy link
Contributor Author

stooke commented Dec 18, 2024

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Dec 18, 2024
@openjdk
Copy link

openjdk bot commented Dec 18, 2024

@stooke
Your change (at version af871b7) is now ready to be sponsored by a Committer.

@dholmes-ora
Copy link
Member

/sponsor

@openjdk
Copy link

openjdk bot commented Dec 19, 2024

Going to push as commit b0c40aa.
Since your change was applied there have been 76 commits pushed to the master branch:

  • 6b89954: 8346475: RISC-V: Small improvement for MacroAssembler::ctzc_bit
  • 00d8407: 8346016: Problemlist vm/mlvm/indy/func/jvmti/mergeCP_indy2manyDiff_a in virtual thread mode
  • 5db0a13: 8346132: fallbacklinker.c failed compilation due to unused variable
  • 5590669: 8346570: SM cleanup of tests for Beans and Serialization
  • c8e94ab: 8346532: XXXVector::rearrangeTemplate misses null check
  • f7f2b42: 8346300: Add @test annotation to TCKZoneId.test_constant_OLD_IDS_POST_2024b test
  • a0b7c4f: 8346324: javax/swing/JScrollBar/4865918/bug4865918.java fails in CI
  • 8efc558: 8346378: Cannot use DllMain in libnet for static builds
  • 73b5dba: 8345655: Move reservation code out of ReservedSpace
  • d50b725: 8344647: Make java.se participate in the preview language feature requires transitive java.base
  • ... and 66 more: https://git.openjdk.org/jdk/compare/18e0b343ab002b193d4db3be226394ee7dbe9f19...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Dec 19, 2024
@openjdk openjdk bot closed this Dec 19, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Dec 19, 2024
@openjdk
Copy link

openjdk bot commented Dec 19, 2024

@dholmes-ora @stooke Pushed as commit b0c40aa.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@dholmes-ora
Copy link
Member

SystemDumpMapTest still fails after this - see JDK-8346717

@stooke
Copy link
Contributor Author

stooke commented Dec 23, 2024

SystemDumpMapTest still fails after this - see JDK-8346717

@dholmes-ora my apologies - this failure (which is timing related) did not sho up in my testing. I will be opening PR #22870 as soon as it passes the GH gates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime [email protected] integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants