Skip to content

[SPARK-51966][PYTHON] Replace select.select() with select.poll() when running on POSIX os #50774

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

wjszlachta-man
Copy link

What changes were proposed in this pull request?

On glibc based Linux systems select() can monitor only file descriptor numbers that are less than FD_SETSIZE (1024).

This is an unreasonably low limit for many modern applications.

This PR replaces select.select() with select.poll() when running on POSIX os.

Why are the changes needed?

When running via pyspark we frequently observe:

Exception occurred during processing of request from ('127.0.0.1', 46334)
Traceback (most recent call last):
  File "/usr/lib/python3.11/socketserver.py", line 317, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/lib/python3.11/socketserver.py", line 348, in process_request
    self.finish_request(request, client_address)
  File "/usr/lib/python3.11/socketserver.py", line 361, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib/python3.11/socketserver.py", line 755, in __init__
    self.handle()
  File "/usr/lib/python3.11/site-packages/pyspark/accumulators.py", line 293, in handle
    poll(authenticate_and_accum_updates)
  File "/usr/lib/python3.11/site-packages/pyspark/accumulators.py", line 266, in poll
    r, _, _ = select.select([self.rfile], [], [], 1)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: filedescriptor out of range in select()

On POSIX systems poll() should be used instead of select().

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing unit tests + manual run on YARN cluster (Linux).

Was this patch authored or co-authored using generative AI tooling?

No

…osix

On glibc based Linux systems select() can monitor only file descriptor numbers
that are less than FD_SETSIZE (1024).

This is an unreasonably low limit for many modern applications.
@wjszlachta-man wjszlachta-man force-pushed the spark-51966-replace-select-with-poll-on-posix branch from 98d5e56 to d3fa95a Compare May 2, 2025 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant