Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDEV-34814 mysqld hangs on startup when --init-file target does not e… #3536

Merged
merged 1 commit into from
Oct 15, 2024

Conversation

DaveGosselin-MariaDB
Copy link
Member

@DaveGosselin-MariaDB DaveGosselin-MariaDB commented Sep 20, 2024

…xist

Set select_thread_in_use only when we're about to enter into the polling loop, not sooner, allowing early proces aborts to exist cleanly: the process won't be waiting for a polling loop that isn't yet polling.

@DaveGosselin-MariaDB DaveGosselin-MariaDB self-assigned this Sep 20, 2024
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Member

@vuvova vuvova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks excessive. select_thread_in_use= 1 means that break_connect_loop() should wait for the signal. And the signal is broadcasted after handle_connections_sockets().

So, if the server startup is aborted, there's no signal. That simply means that select_thread_in_use= 1 should be done after the last unireg_abort() before handle_connections_sockets(). And all these select_thread_in_use= 0 can be removed.

@DaveGosselin-MariaDB
Copy link
Member Author

This looks excessive. select_thread_in_use= 1 means that break_connect_loop() should wait for the signal. And the signal is broadcasted after handle_connections_sockets().

Please look more closely at the patch.

So, if the server startup is aborted, there's no signal.

This is false. unireg_abort calls wait_for_signal_thread_to_end which causes the process to send SIGTERM to itself.

Calling unireg_abort from mysqld_main used to work fine until the call to wait_for_signal_thread_to_end was added to unireg_abort in git commit 814dc46. Because of this call, we end up waiting forever for select_thread_in_use to be set to zero. This change allows the process to quit prematurely when break_connect_loop() is called from signal_hand(), which will happen when processing the SIGTERM signal sent by during unireg_abort().

@vuvova
Copy link
Member

vuvova commented Oct 4, 2024

I mean,

  mysql_mutex_lock(&LOCK_start_thread);
  select_thread_in_use=0;
  mysql_cond_broadcast(&COND_start_thread);
  mysql_mutex_unlock(&LOCK_start_thread);

is done after handle_connections_sockets() and break_connect_loop() waits for it

  while (select_thread_in_use)
  {
    set_timespec(abstime, 2);
    for (uint tmp=0 ; tmp < 10 && select_thread_in_use; tmp++)
    {
      error= mysql_cond_timedwait(&COND_start_thread, &LOCK_start_thread,
                                  &abstime);
      if (error != EINTR)
        break;
    }
    close_server_sock();
  }

It's assumed that select_thread_in_use is set to zero when COND_start_thread is signalled. I suppose when the server startup is aborted early, the condition isn't signalled. Meaning select_thread_in_use shouldn't have been 1 in the first place. That's why every early exit sets select_thread_in_use=0.

I think it'd be more logical not to set select_thread_in_use=1 too early, then early aborts wouldn't need to set it to 0.

sql/mysqld.cc Outdated
@@ -5344,6 +5344,26 @@ static void test_lc_time_sz()
#endif//DBUG_OFF


static void signal_accept_connections()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh, it's not just "signal accept connections", handle_connection_sockets() is the main program loop, that's where the server spends all its time. Everything before it is startup, everything after it is shutdown.

If you want that code in a separate function — it needs a better name. And also, please, explain your reasoning for extracting it into a function, I don't quite understand it yet

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😊 I wasn't sure of a good name (I used the word "signal" here not in the Linux signal sense, but that wasn't a good choice clearly). Maybe just enter_main_loop or something like that would be clearer. Yes, I understand that handle_connections_sockets is where we spend all of our time, that's where the select call is and the main program loop. I moved this to a separate function to encapsulate the symmetry of setting select_thread_in_use to true (before entering the main loop) needs to be paired with setting it to false when exiting the main loop. I didn't dig through the git history but I suspect that this value was set closer to the call to handle_connections_sockets in the past, but code was added that moved it away gradually. This function is meant to show that handle_connections_sockets and select_thread_in_use go together (mostly, but we have another place where it is cleared... anyway, step by step).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be run_main_loop or execute_main_loop or something along these lines? perform_ or whatever?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I will go with run_main_loop.

…xist

Set select_thread_in_use only when we're about to enter into the polling
loop, not sooner, allowing early proces aborts to exist cleanly: the
process won't be waiting for a polling loop that isn't yet polling.
@DaveGosselin-MariaDB DaveGosselin-MariaDB merged commit e9c71f3 into 10.5 Oct 15, 2024
8 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants