Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lldpad eloop select change to poll #120

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

huyizhen
Copy link

In an environment with 576 virtual NICs, when the function eloop_sock_table_dispatch calls FD_ISSET, the value of file descriptor table->table[i].sock exceeds 1024 (the kernel fds structure has only 1024 bits). As a result, the glibc determines that a buffer overflow occurs and aborts the process.

To solve this problem, we change select to poll because poll has no restriction on the fd size. Poll allows file descriptors to listen to different events, such as POLLIN and POLLOUT. Therefore, no need to create different queues for different events, we delete writers and exceptions and rename readers to sock_table.

@penguin359
Copy link
Contributor

@huyizhen Please fix the build failures and push an update.

@huyizhen
Copy link
Author

@huyizhen Please fix the build failures and push an update.

Fixed

Copy link
Contributor

@penguin359 penguin359 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some issues with how the list of pollfds are being constructed and passed to poll(). There should be one entry in the array for each requested fd, but no more. The current logic creates one entry for every fd up to max used so it will have entries for stdin, stderr, and other fds that are not desired. I also don't think those slots are being initialized after realloc if they don't map to a specific fd that is in the sock table.

@huyizhen
Copy link
Author

Great! I have modified the code according to your issues. Please review the latest code again. Thank you.
How to trigger the build? Since I found that after git push - f, the build was not automatically triggered.

@huyizhen
Copy link
Author

Dear penguin359, is there anything else I need to do to merge this PR?@penguin359

@huyizhen huyizhen requested a review from penguin359 December 31, 2024 01:29
@penguin359
Copy link
Contributor

This PR touches quite a few lines of code and now it is modifying 4 different files. It will take a little time to review and some maintainers are currently still out for the holidays.

@huyizhen
Copy link
Author

Thanks for your reply, I will wait patiently for the review, happy holidays!

@apconole
Copy link
Contributor

apconole commented Jan 8, 2025

I've added some comments. I think this needs to be broken into a few commits. There are renames and other types of cleanups interspersed with the actual meat of the change. Please do your best to organize the changes so that each commit will be a separate logical change.

@huyizhen
Copy link
Author

huyizhen commented Jan 9, 2025

I've added some comments. I think this needs to be broken into a few commits. There are renames and other types of cleanups interspersed with the actual meat of the change. Please do your best to organize the changes so that each commit will be a separate logical change.

Thank you for your comments, I've changed the code as much as possible, but there are a few more comments, including free(fds), the location of poll.h, ReadyToWrite, and I'd like to discuss it with you further.
Switching from select to poll does bring a lot of changes, and I've reverted the previous code as much as possible to make the patch easier to read.
I think the logic in the current patch is all necessary. In other words, if the separation continues, it will be difficult for future readers to understand. For example, retaining some FD_XX interfaces does not affect the commit, but people may have doubts when reading this patch, why the author didn't remove all FD_XX interfaces?
Do you have a better way or suggestion for separating commit?
Looking forward to your next review, thank you!

@huyizhen huyizhen requested a review from apconole January 10, 2025 01:11
@huyizhen
Copy link
Author

Dear apconole and penguin359, could you please continue to review this PR? I have resolved or replied to the current issues, thank you! @apconole @penguin359

@penguin359
Copy link
Contributor

One of my concerns with this PR that I saw in my initial assessment was that it collapsed the event handler into being indexed only by file descriptor instead of file descriptor and event type. That means that it could overwrite a different handler waiting on the file descriptor for a different event. The last handler to wait would win. However, after taking a detailed look through the original code, I see that neither EVENT_TYPE_WRITE nor EVENT_TYPE_EXCEPTION has ever been used by any handlers anywhere in Git history, only EVENT_TYPE_READ. That could be considered over-engineering and there's no current need to keep it. I still want to further review it, but that concern has been alleviated.

@huyizhen
Copy link
Author

One of my concerns with this PR that I saw in my initial assessment was that it collapsed the event handler into being indexed only by file descriptor instead of file descriptor and event type. That means that it could overwrite a different handler waiting on the file descriptor for a different event. The last handler to wait would win. However, after taking a detailed look through the original code, I see that neither EVENT_TYPE_WRITE nor EVENT_TYPE_EXCEPTION has ever been used by any handlers anywhere in Git history, only EVENT_TYPE_READ. That could be considered over-engineering and there's no current need to keep it. I still want to further review it, but that concern has been alleviated.

Thank you for your comments, looking forward to your review!

Copy link
Contributor

@penguin359 penguin359 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking the time to work on this.

After a thorough review, the code looks functionally correct. I do have a few small changes I've recommended in code layout, but no changes needed to the logic.

In an environment with 576 virtual NICs, when the function eloop_sock_table_dispatch calls FD_ISSET,
the value of file descriptor table->table[i].sock exceeds 1024 (the kernel fds structure has only 1024 bits).
As a result, the glibc determines that a buffer overflow occurs and aborts the process.

To solve this problem, we change select to poll because poll has no restriction on the fd size.
Poll allows file descriptors to listen to different events, such as POLLIN and POLLOUT.
Therefore, no need to create different queues for different events, we delete writers and exceptions
and rename readers to sock_table.
@penguin359
Copy link
Contributor

I'll say it's looking pretty good right now and seems to be easy to follow the changes. I'll try testing this out and add a formal approval if it all is working.

@apconole
Copy link
Contributor

@penguin359 were you planning on acking this?

@penguin359
Copy link
Contributor

Yes, let me take one last look at the lastest version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants