Acquire neighboring router MAC dynamically #619

PlagueCZ · 2024-10-17T22:29:09Z

Currently dpservice acquires neighboring router MAC addresses during initialization.

Not only this is a problem if there is a port replaced on a switch (thus the MAC changes), but in multiport-eswitch with pf1-proxy, the MAC is taken from the proxy port. But without dpservice running, there are no neigbors registered (because the proxy interface is effectively dead at that time), so the MAC needs to be loaded dynamically after a while.

Thus I added a mechanism that in case of a missing neighboring router MAC, a timer is started and the retrieval is retried again later.

The timer has an exponential backoff, starting with 1s and doubled each time, until it reaches 60s.

This PR is best read commit-by commit, since that shows the evolution of all the changes.

Connected to #606

There was also a need for the init container to fail when some of the values for dp_service.conf were not available. I lumped it together here as there were already changes to the prepare script, but this can be separated if needed.

guvenc

Some documentation related request. Details in the review comment. Thanks !

guvenc · 2024-10-18T09:33:37Z

docs/deployment/help_dpservice-bin.md

@@ -7,6 +7,7 @@
 | --pf0 | IFNAME | first physical interface (e.g. eth0) |  |
 | --pf1 | IFNAME | second physical interface (e.g. eth1) |  |
 | --pf1-proxy | IFNAME | VF representor to use as a proxy for pf1 packets |  |
+| --pf1-proxy-vf | IFNAME | VF interface of the pf1-proxy VF representor |  |


Do we have some combinations which do not make sense with the newly introduced flags ?
Like, if I set pf1-proxy then I need to set pf1-proxy-vf as well ? and If I set multiport-eswitch, I would needpf1-proxy-* switches and if they are not set then I can not operate meaningfully ?

If so, can we document these dependencies ? and enforce during command line argument parsing with meaningful hints returned ?

Also in the documentation, maybe an example dpservice-bin command including all the command line parameters for a simple mpesw operation ?
Or maybe I overlooked it in the documentation ? Otherwise it is not so clear now, how to make mpesw work and which parameters are needed to make it work. At least to me.

These command line arguments are supposed to be generated by prepare.sh right ? but still showcasing in an example which parameters needed for dpservice-bin to operate successfully in mpesw mode, would be helpful and how (with which parameters) am I supposed to call prepare.sh if I want to operate in mpesw mode ?

Signed-off-by: Guvenc Gulce <[email protected]>

guvenc

Please see comment details.

guvenc · 2024-10-22T11:10:06Z

src/monitoring/dp_event.c

@@ -88,6 +85,12 @@ void dp_process_event_link_msg(struct rte_mbuf *m)
 	}

 	port->link_status = status;
+	DPS_LOG_INFO("PF link state changed", DP_LOG_LINKSTATE(port->link_status), DP_LOG_PORT(port));
+
+	if (status == RTE_ETH_LINK_UP)


I think we should move this part to dp_link_status_change_event_callback. For me this looks like an unnecessary round trip from worker thread back to main thread by calling these functions here and the worker thread directly calls the netlink call which is not so nice as the worker thread should minimize the work which doesnt include packet processing.
Another problem with this approach is that worker thread now will enqueue to the monitoring queue which is actually supposed to be enqueued only by the main thread.
This approach will worsen the problem we already have in the code. monitoring_queue is being enqueued by main thread and by the interrupt thread (dp_link_status_change_event_callback) at the moment which is already a problem. Now we bring a third thread (worker thread) which will enqueue to the monitoring queue and monitoring queue is created as single producer, single consumer actually.

So in a nutshell:

During init of the PF ports we can arm the single call timer run on main thread which will call netlink, re-arm if netlink fails, send mac_neigh event with mac address if netlink doesnt fail. This is already done this way.

And in dp_link_status_change_event_callback we can arm the single call timer run on main thread if the link state is reported as up and let the timer callback call netlink, re-arm if netlink fails, send mac_neigh event with mac address if netlink doesnt fail. So that we ensure the enqueue of mac_neigh event happens in main thread context.

And also in dp_link_status_change_event_callback stop the the timer, if the link state gets reported as "down".

with this approach, we wouldnt at least worsen the situation in current code. Interrupt thread enqueuing together with main thread to monitoring queue can be addressed separately as it was not caused by this PR.

I moved the indicated code away from worker thread to the "interrupt" thread (the one where timer fires).

I think this should be enough, but if we only ever want to call the MAC retrieval code from one thread (the interrupt thread) and not the main thread, it would require removing the ability to "get the MAC immediately" in the init phase, thus always waiting at least a second for it. Which I do not believe is needed as the real problem here was the worker doing unnecessary stuff, which is fixed now.

I also think that the change is sufficient for this PR.

PlagueCZ added 5 commits October 17, 2024 18:13

Retrieve initial link status for PFs

547e5a3

More verbose port startup and link state

8cae391

Set neighboring router MAC on link state change

818d49d

Check for neighboring mac with exponential backoff

1e8379e

Make prepare script end on error

722a29d

PlagueCZ marked this pull request as ready for review October 17, 2024 22:29

PlagueCZ requested a review from a team as a code owner October 17, 2024 22:29

github-actions bot added documentation Improvements or additions to documentation enhancement New feature or request size/L labels Oct 17, 2024

guvenc requested changes Oct 18, 2024

View reviewed changes

PlagueCZ added 2 commits October 19, 2024 00:01

Add better cmdline argument checks for multiport-eswitch

438850f

More multiport-eswitch documentation for running dpservice

77bb2f1

PlagueCZ force-pushed the feature/dynamic_neighmac branch from 0c48b9c to 77bb2f1 Compare October 18, 2024 22:01

guvenc added 2 commits October 21, 2024 20:00

Adjust number of VFs based on mpesw flag in prepare.sh

cde58a9

Signed-off-by: Guvenc Gulce <[email protected]>

Correct licensing information

630ea2d

Signed-off-by: Guvenc Gulce <[email protected]>

guvenc requested changes Oct 22, 2024

View reviewed changes

Move MAC resolving away from worker thread

0ebf723

guvenc approved these changes Nov 5, 2024

View reviewed changes

guvenc merged commit 0ebf723 into main Nov 5, 2024
6 checks passed

guvenc deleted the feature/dynamic_neighmac branch November 5, 2024 13:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Acquire neighboring router MAC dynamically #619

Acquire neighboring router MAC dynamically #619

PlagueCZ commented Oct 17, 2024

guvenc left a comment

guvenc Oct 18, 2024

guvenc left a comment

guvenc Oct 22, 2024 •

edited

Loading

PlagueCZ Nov 1, 2024

guvenc Nov 5, 2024

Acquire neighboring router MAC dynamically #619

Acquire neighboring router MAC dynamically #619

Conversation

PlagueCZ commented Oct 17, 2024

guvenc left a comment

Choose a reason for hiding this comment

guvenc Oct 18, 2024

Choose a reason for hiding this comment

guvenc left a comment

Choose a reason for hiding this comment

guvenc Oct 22, 2024 • edited Loading

Choose a reason for hiding this comment

PlagueCZ Nov 1, 2024

Choose a reason for hiding this comment

guvenc Nov 5, 2024

Choose a reason for hiding this comment

guvenc Oct 22, 2024 •

edited

Loading