Skip to content
This repository has been archived by the owner on Jan 24, 2024. It is now read-only.

Fix unknown exception caused by thread safety issue #2033

Merged
merged 4 commits into from
Jan 4, 2024

Conversation

315157973
Copy link
Contributor

Motivation

Both the response thread and timeout detection thread will access PendingProduceCallback
This problem will easily occur when the sending latency is close to the timeout threshold.

Step-1: response thread and timeout detection thread access 1 at the same time, none of them meet the conditions for return

Step-2: timeout detection thread execute to 2, and set responseMap = null
Setp-3: response thread execute to 3, cause NPE

image

Modifications

Add synchronized

Documentation

  • no-need-doc

@CLAassistant
Copy link

CLAassistant commented Dec 28, 2023

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added the no-need-doc This pr does not need any document label Dec 28, 2023
@315157973 315157973 removed their assignment Dec 28, 2023
@BewareMyPower
Copy link
Collaborator

Could you sign the CLA?

Demogorgon314
Demogorgon314 previously approved these changes Dec 29, 2023
Copy link
Collaborator

@BewareMyPower BewareMyPower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spotbugs check failed:

Medium: Inconsistent synchronization of io.streamnative.pulsar.handlers.kop.storage.ReplicaManager$PendingProduceCallback.responseMap; locked 60% of time [io.streamnative.pulsar.handlers.kop.storage.ReplicaManager$PendingProduceCallback, io.streamnative.pulsar.handlers.kop.storage.ReplicaManager$PendingProduceCallback, io.streamnative.pulsar.handlers.kop.storage.ReplicaManager$PendingProduceCallback, io.streamnative.pulsar.handlers.kop.storage.ReplicaManager$PendingProduceCallback, io.streamnative.pulsar.handlers.kop.storage.ReplicaManager$PendingProduceCallback] Unsynchronized access at ReplicaManager.java:[line 119]Unsynchronized access at ReplicaManager.java:[line 121]Synchronized access at ReplicaManager.java:[line 127]Synchronized access at ReplicaManager.java:[line 125]Synchronized access at ReplicaManager.java:[line 133] IS2_INCONSISTENT_SYNC

You need to check the synchronized code blocks again.

BTW, in our private fork, this pending produce is removed and the timeout field is not processed.

Copy link

codecov bot commented Dec 31, 2023

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (aa2f171) 16.98% compared to head (69ef235) 16.96%.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #2033      +/-   ##
============================================
- Coverage     16.98%   16.96%   -0.03%     
+ Complexity      730      728       -2     
============================================
  Files           191      191              
  Lines         14310    14310              
  Branches       1339     1339              
============================================
- Hits           2430     2427       -3     
- Misses        11704    11705       +1     
- Partials        176      178       +2     
Files Coverage Δ
...ve/pulsar/handlers/kop/storage/ReplicaManager.java 0.00% <0.00%> (ø)

... and 2 files with indirect coverage changes

@BewareMyPower BewareMyPower merged commit 99eeca5 into streamnative:master Jan 4, 2024
19 of 21 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
no-need-doc This pr does not need any document
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants