Skip to content

grpc-default-executor threads stop processing tasks: how can we troubleshoot in such situation? #8174

Closed
@suztomo

Description

@suztomo

Do you know a good way to troubleshoot "grpc-default-executor" threads' status?

In apache/beam#14768 (comment), when I tried to upgrade Beam's vendored (shaded) gRPC dependency to 1.37.0 (or 1.36) from gRPC 1.26.0, I observed that some tests (GrpcLoggingServiceTest or BeamFnLoggingServiceTest randomly) do not finish. Borrowing Kenn's words, BeamFnLoggingServiceTest does the followings:

  • start a logging service
  • set up some stub clients, each with onError wired up to release a countdown latch
  • send error responses to all three of them (actually it sends the error in the same task it creates the stub)
  • each task waits on the latch

(GrpcLoggingServiceTest has similar structure)

Unfortunately it occurs only in Beam's CI Jenkins environment (which takes ~1 hour to finish). I cannot reproduce the problem locally.

From the observation of the trace log and the previous thread dump, it seems that grpc-default-executor threads stop processing tasks (the thread dump showed no "grpc-default-executor" threads in the JVM when the test was waiting for the them to count down a CountDownLatch) and one of the latches are not counted down. This results in the test threads waiting forever for the remaining latch. I cannot tell why the "grpc-default-executor" threads stop working (disappear?).

Do you know how to troubleshot such situation?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions