Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not drain worker tasks on main thread #47452

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

dinfuehr
Copy link

@dinfuehr dinfuehr commented Apr 6, 2023

Hi!

This PR stops draining worker tasks on the main thread. I believe it's not necessary since V8 will join its own worker tasks as part of v8::Isolate::Dispose. It may also cause deadlocks now that those worker tasks may ask the main thread to perform a GC, see the discussion in this V8 issue here. This should help enable concurrent sparkplug again, which will get/is disabled with this PR.

Do you know why Node is draining worker tasks here? Am I seeing right that NodePlatform::DrainTasks is also invoked from the event loop here? If so, this might be a problem for performance as we block the main thread until those worker tasks are finished when the event loop is empty.

I have to admit that I didn't run tests yet for that PR and it may require V8 patches as well (see the V8 issue linked above) but I opened the PR already to discuss this.

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels Apr 6, 2023
@targos
Copy link
Member

targos commented Apr 7, 2023

@dinfuehr
Copy link
Author

dinfuehr commented Apr 8, 2023

Yes, they are necessary for this.

@joyeecheung
Copy link
Member

joyeecheung commented Apr 8, 2023

I think @addaleax would be the best person to answer the questions since she wrote some of the initial patches of the migration to v8 platform, but my 2 cents:

Am I seeing right that NodePlatform::DrainTasks is also invoked from the event loop here?

Yes.

Do you know why Node is draining worker tasks here?

This seems to date back to the initial patches of V8 platform integration, but the same question was asked in https://github.com/v8/node/pull/69/files#r187129604 too. From some local testing it seems if we don't drain the worker tasks here (specifically, in the event loop), some foreground task (e.g. a wasm async compile task, which probably comes from the cjs-module-lexer) could get posted after the event loop already finishes, and the general processing order can be messed up - which is why there are a lot of exit code 13 failures in the test here, that means we have some leftover microtasks not yet cleared from the queue during shutdown. We run the microtasks after running foreground tasks here

InternalCallbackScope cb_scope(env, Object::New(isolate_), { 0, 0 },
(in the InternalCallbackScope destructor) when the Node.js environment is still around e.g. when we are still running the event loop. (Personally I am a bit puzzled why we are using InternalCallbackScope for this because its destructor does..a lot of things, probably way more than what PerIsolatePlatformData::RunForegroundTask needs).

@aduh95
Copy link
Contributor

aduh95 commented May 12, 2024

It looks like there are linter and test failures, could you please reabase and address those?

@aduh95 aduh95 added the stalled Issues and PRs that are stalled. label May 12, 2024
Copy link
Contributor

This issue/PR was marked as stalled, it will be automatically closed in 30 days. If it should remain open, please leave a comment explaining why it should remain open.

Copy link
Contributor

Closing this because it has stalled. Feel free to reopen if this issue/PR is still relevant, or to ping the collaborator who labelled it stalled if you have any questions.

@github-actions github-actions bot closed this Jun 12, 2024
@targos targos reopened this Sep 12, 2024
@targos targos removed the stalled Issues and PRs that are stalled. label Sep 12, 2024
@targos
Copy link
Member

targos commented Sep 12, 2024

I checked out the PR, rebased it and fixed the linter issues but for some reason I'm unable to push it back:

$ git push [email protected]:dinfuehr/node di/do-not-drain-worker-tasks -f
ERROR: Permission to dinfuehr/node.git denied to targos.
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

In any case, here's a copy on my fork: https://github.com/targos/node/tree/do-not-drain-worker-tasks

src/node_platform.cc Outdated Show resolved Hide resolved
@targos
Copy link
Member

targos commented Sep 13, 2024

@aduh95 Thanks! Now I'm curious: how did you do it?

Copy link

codecov bot commented Sep 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.06%. Comparing base (e21984b) to head (ab23463).
Report is 44 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #47452      +/-   ##
==========================================
+ Coverage   88.04%   88.06%   +0.01%     
==========================================
  Files         651      652       +1     
  Lines      183409   183543     +134     
  Branches    35828    35864      +36     
==========================================
+ Hits       161487   161635     +148     
+ Misses      15174    15160      -14     
  Partials     6748     6748              
Files with missing lines Coverage Δ
src/node_platform.cc 87.80% <100.00%> (-0.03%) ⬇️

... and 55 files with indirect coverage changes

@aduh95
Copy link
Contributor

aduh95 commented Sep 13, 2024

Now I'm curious: how did you do it?

I rebased using the GH "Update branch" API – I initially tried pushing your branch using git, and saw the same error as you.

@RedYetiDev
Copy link
Member

RedYetiDev commented Oct 15, 2024

This might be already known, but from my testing, it's worker_thread_task_runner_->BlockingDrain(); that is hanging. Regardless of how it's executed (main thread or not), it shouldn't hang, no?

More specifically, it's the while loop at

node/src/node_platform.cc

Lines 639 to 641 in 2545b9e

while (outstanding_tasks_ > 0) {
tasks_drained_.Wait(scoped_lock);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants