Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-33586 fix double checked locking pattern in dali #19585

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

streeterd
Copy link
Contributor

@streeterd streeterd commented Mar 5, 2025

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Copy link

github-actions bot commented Mar 5, 2025

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-33586

Jirabot Action Result:
Workflow Transition To: Merge Pending
Updated PR

@streeterd streeterd force-pushed the HPCC-33586_Fix_double-checked_locking_pattern_for_CNamedGroupStore_in_Dali branch from 192c358 to 9f6be12 Compare March 5, 2025 13:53
@streeterd streeterd self-assigned this Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.

@streeterd streeterd marked this pull request as ready for review March 5, 2025 14:05
@streeterd streeterd changed the title HPCC-33586 fix double checked locking pattern for c named group store in dali HPCC-33586 fix double checked locking pattern for CNamedGroupStore in dali Mar 11, 2025
@streeterd streeterd marked this pull request as draft March 11, 2025 10:26
Change DFdir to be an atomic pointer for thread safety.

Signed-off-by: Dave Streeter <[email protected]>
Change DFdir to be an atomic pointer for thread safety.

Signed-off-by: Dave Streeter <[email protected]>
@streeterd streeterd force-pushed the HPCC-33586_Fix_double-checked_locking_pattern_for_CNamedGroupStore_in_Dali branch from 9f6be12 to 53ebcce Compare March 11, 2025 10:50
@streeterd streeterd marked this pull request as ready for review March 11, 2025 10:51
@@ -8239,20 +8239,22 @@ class CNamedGroupStore: implements INamedGroupStore, public CInterface

};

static CNamedGroupStore *groupStore = NULL;
static std::atomic<CNamedGroupStore *> groupStore{nullptr};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a mixture of fixes here. Either the variable should be atomic, or it should be accessed within a critical section. No need to do both.

static CriticalSection groupsect;

bool CNamedGroupIterator::match()
{
if (conn.get()) {
if (matchgroup.get()) {
if (!groupStore)
CLeavableCriticalBlock block3(groupsect);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This critical section will be held over the call to dolookup. That is likely to cause extra thread contention.

If you are worried about groupStore being destroyed, a more efficient solution is to copy and link it with the critical block, and then use the linked pointer. However, see note on closedownDFS() below - in which case it can never be freed.

@@ -9204,7 +9205,7 @@ GetFileClusterNamesType CDistributedFileDirectory::getFileClusterNames(const cha
// --------------------------------------------------------


static CDistributedFileDirectory *DFdir = NULL;
static std::atomic<CDistributedFileDirectory *> DFdir{nullptr};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comments to above - either make it atomic, or protect all access within a critical section.

@@ -9229,7 +9228,7 @@ void closedownDFS() // called by dacoven
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is closedownDFS() ever called - I don't think so. If not the function can be deleted, avoiding the lifetime problem above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

closedownDFS() is called from dali/base/dacoven.cpp :

image

@ghalliday
Copy link
Member

Comment about closedownDFS() is inaccurate - I was only searching open documents, so missed the call.

As per PR comments remove unnecessary CriticalSection

Signed-off-by: Dave Streeter <[email protected]>
@streeterd streeterd requested review from Copilot and ghalliday March 11, 2025 16:39

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.

if (e->errorCode()!=DCERR_server_closed)
throw;
e->Release();
if (groupStore.load())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ghalliday would it be safer to move up the groupStore release before the DFdir delete in case the DFdir delete ends up throwing an exception? Or not bother?

@streeterd streeterd changed the title HPCC-33586 fix double checked locking pattern for CNamedGroupStore in dali HPCC-33586 fix double checked locking pattern in dali Mar 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants