Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

broker: add FLUX_IPADDR_INTERFACE to select broker network interface #5707

Merged
merged 4 commits into from
Jan 30, 2024

Conversation

garlick
Copy link
Member

@garlick garlick commented Jan 26, 2024

Problem: during scale testing, it was observed that Flux was using an unstable network fabric, but there is no way to tell it to use another one.

Add the FLUX_IPADDR_INTERFACE environment variable. It can be set to an interface name in the broker's environment, and then PMI bootstrap will select an address associated with that interface.

@garlick garlick changed the title WIP: add FLUX_IPADDR_INTERFACE to select broker network interface broker: add FLUX_IPADDR_INTERFACE to select broker network interface Jan 29, 2024
@garlick
Copy link
Member Author

garlick commented Jan 29, 2024

I added a test and removed WIP.

Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This was handy during scaling tests when one of the networks was unstable. So approved!

@garlick
Copy link
Member Author

garlick commented Jan 30, 2024

Thanks! I got a test failure here in one builder:

2024-01-29T21:31:11.7353288Z expecting success: 
2024-01-29T21:31:11.7353647Z 	FLUX_IPADDR_INTERFACE=badiface test_expect_code 137 flux start \
2024-01-29T21:31:11.7354230Z 		${ARGS} --test-exit-timeout=1s -s2 -o,-Stbon.prefertcp=1 true
2024-01-29T21:31:11.7354553Z 
2024-01-29T21:31:11.7355252Z test_expect_code: command exited with 0, we wanted 137 flux start -o,-Sbroker.rc1_path=,-Sbroker.rc3_path= --test-exit-timeout=1s -s2 -o,-Stbon.prefertcp=1 true
2024-01-29T21:31:11.7356145Z not ok 89 - FLUX_IPADDR_INTERFACE=badiface fails

I wasn't sure if maybe setting the environment variable before a call to a shell function was perhaps not portable (the failure was in Ubunto 20.04)? So I pushed a small change to run the test in a subshell and export the variable within that subshell.

(Now that I describe the problem and note the OS it seems doubful that this was the issue.... there may be something else going on? Let's see how it goes this time in CI)

@garlick
Copy link
Member Author

garlick commented Jan 30, 2024

That test failure disappeared so I'll go ahead and set MWP. If it pops up again we can deal with it.

Problem: there is no way to force a flux instance to use a
specific network interface for overlay communication.

Add an environment variable: FLUX_IPADDR_INTERFACE.
If set to an interface name, that will select the preferred network.
Problem: FLUX_IPADDR_INTERFACE is not documented.

Add it to the envrironment man page.
Problem: there is no test coverage for the FLUX_IPADDR_INTERFACE
environment variable.

Add a couple checks.
Problem: when ipaddr_getprimary() fails during PMI bootstrap, the
broker prints the textual error plus, unnecessarily, the errno string.

This results in messages like this:
flux-broker: could not find address of badiface: Success

Drop the errno string suffix.
Copy link

codecov bot commented Jan 30, 2024

Codecov Report

Merging #5707 (fa4fdf5) into master (556d42c) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5707      +/-   ##
==========================================
+ Coverage   83.44%   83.46%   +0.01%     
==========================================
  Files         486      486              
  Lines       82892    82897       +5     
==========================================
+ Hits        69170    69187      +17     
+ Misses      13722    13710      -12     
Files Coverage Δ
src/broker/boot_pmi.c 68.26% <100.00%> (+1.10%) ⬆️
src/common/libutil/ipaddr.c 81.08% <100.00%> (+10.32%) ⬆️

... and 8 files with indirect coverage changes

@mergify mergify bot merged commit 596b073 into flux-framework:master Jan 30, 2024
33 of 34 checks passed
@garlick garlick deleted the ipaddr_interface branch January 30, 2024 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants