[improve][ci] Detect test thread leaks in CI builds and add tooling for resource leak investigation #21450
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
It should be ensured in each test that resources created by the test are properly cleaned up. Failing to do so can lead to memory leaks and, in some instances, unnecessary CPU consumption. These issues can, in turn, slow down test execution, increase Pulsar CI build durations, and cause flakiness. A significant source of memory leaks in Pulsar tests stems from thread leaks. This PR aims to add reporting and tooling to detect thread leaks in tests.
Issues / requirements:
Proposed Solutions:
Heap dump tooling for tests:
This PR includes a helper class that enables programmatic creation of heap dumps during tests. A heap dump offers a snapshot of the heap state, which can be invaluable for debugging certain types of issues. Tools such as Eclipse MAT can be used to inspect heap dumps. Eclipse MAT offers a query language to extract summaries. Moreover, the Calcite plugin for Eclipse MAT enables querying and summarizing heap dump information using SQL, which can be extremely handy at times.
Modifications
ThreadLeakDetectorListener Improvements:
Pulsar CI Reporting Enhancements:
TraceTestResourceCleanupListener Introduction:
Documentation
doc
doc-required
doc-not-needed
doc-complete