Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug report] dataplane poll with default port_number doesn't work when there are multiple ptf nn agents #207

Open
w1nda opened this issue Dec 5, 2024 · 5 comments

Comments

@w1nda
Copy link
Contributor

w1nda commented Dec 5, 2024

Source code: #

self, device_number=0, port_number=None, timeout=None, exp_pkt=None, filters=[]

Simplified code:

    def poll(
        self, device_number=0, port_number=None, timeout=None, exp_pkt=None, filters=[]
    ):
        def grab():
            self.logger.debug("Grabbing packet")
            for rcv_port_number, pkt, time in self.packets(device_number, port_number):
                rcv_device_number = device_number
                if not exp_pkt or match_exp_pkt(exp_pkt, pkt):
                    return DataPlane.PollSuccess(
                        rcv_device_number, rcv_port_number, pkt, exp_pkt, time
                    )
            return None

        with self.cvar:
            ret = ptfutils.timed_wait(self.cvar, grab, timeout=timeout)

        return ret

The problem is:
when there are multiple ptf nn agents connecting the dataplane, there will be equvilent count of "device numbers", however, when we call poll with port_number=None, it will only poll packet from the device with number 0.

That doesn't make sense, when exp_pkt is not none and port number is none, we should poll packets from all devices.

@jafingerhut
Copy link
Collaborator

jafingerhut commented Dec 5, 2024

Would it be even more flexible and potentially useful if the following options were provided to callers of poll?

  1. The current behavior, where the caller can specify a single device number to perform the poll operation on.
  2. A new behavior, perhaps indicated by providing a new special named value as the device_number parameter to mean "all devices", that behaves as you suggest?

It seems worth preserving a way to get behavior 1, in case someone really wants it.

@w1nda
Copy link
Contributor Author

w1nda commented Dec 6, 2024

Would it be even more flexible and potentially useful if the following options were provided to callers of poll?

  1. The current behavior, where the caller can specify a single device number to perform the poll operation on.
  2. A new behavior, perhaps indicated by providing a new special named value as the device_number parameter to mean "all devices", that behaves as you suggest?

It seems worth preserving a way to get behavior 1, in case someone really wants it.

Thanks for help

Yes, behavior 1 is worth preserving.

For behavior 2, that's what I need, I agree with your proposal, when can reserve a special value for device_number to mean "all devices", as the "all port" is indicated by None, so I think None is a good option.

@jafingerhut
Copy link
Collaborator

Are you willing to write a PR with a proposed change, for others to review?

Ideally it would be best if such a change was backwards compatible, i.e. except for the new case of device_number=None with new behavior, all existing cases should behave the same when device_number != None.

@w1nda
Copy link
Contributor Author

w1nda commented Dec 17, 2024

Are you willing to write a PR with a proposed change, for others to review?

Ideally it would be best if such a change was backwards compatible, i.e. except for the new case of device_number=None with new behavior, all existing cases should behave the same when device_number != None.

Yes, I'm happy to code that.

However, seems that there is no test in ptf repo, so I think it's better to patch ptf package in sonic repo, when the patch works fine and was tested with good quality, I will backport the patch to ptf repo.

@jafingerhut
Copy link
Collaborator

Are you willing to write a PR with a proposed change, for others to review?
Ideally it would be best if such a change was backwards compatible, i.e. except for the new case of device_number=None with new behavior, all existing cases should behave the same when device_number != None.

Yes, I'm happy to code that.

However, seems that there is no test in ptf repo, so I think it's better to patch ptf package in sonic repo, when the patch works fine and was tested with good quality, I will backport the patch to ptf repo.

That sounds like a reasonable plan.

lguohan pushed a commit to sonic-net/sonic-buildimage that referenced this issue Jan 19, 2025
…iple ptf nn agents connection (#21070)

When testing sonic with ptf dataplane connecting multiple ptf nn agents, some cases will fail because of packets queue in ptf were not polled thoroughly. This is a bug or missing feature in ptf: p4lang/ptf#207
as a short term quick fix, this PR will patch the ptf-py3 package and unblock our qualification process
VladimirKuk pushed a commit to Marvell-switching/sonic-buildimage that referenced this issue Jan 21, 2025
…iple ptf nn agents connection (sonic-net#21070)

When testing sonic with ptf dataplane connecting multiple ptf nn agents, some cases will fail because of packets queue in ptf were not polled thoroughly. This is a bug or missing feature in ptf: p4lang/ptf#207
as a short term quick fix, this PR will patch the ptf-py3 package and unblock our qualification process
mssonicbld added a commit to mssonicbld/sonic-buildimage-msft that referenced this issue Jan 21, 2025
…iple ptf nn agents connection

<!--
     Please make sure you've read and understood our contributing guidelines:
     https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

     ** Make sure all your commits include a signature generated with `git commit -s` **

     If this is a bug fix, make sure your description includes "fixes #xxxx", or
     "closes #xxxx" or "resolves #xxxx"

     Please provide the following information:
-->

#### Why I did it
When testing sonic with ptf dataplane connecting multiple ptf nn agents, some cases will fail because of packets queue in ptf were not polled thoroughly. This is a bug or missing feature in ptf: p4lang/ptf#207
as a short term quick fix, this PR will patch the ptf-py3 package and unblock our qualification process.
##### Work item tracking
- Microsoft ADO **(number only)**:

#### How I did it
Support poll all devices in ptf dataplane.

#### How to verify it
Run tests using ptf dataplane on testbed

<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [ ] 202205
- [ ] 202211
- [ ] 202305

#### Tested branch (Please provide the tested image version)

<!--
- Please provide tested image version
- e.g.
- [x] 20201231.100
-->

- [ ] <!-- image version 1 -->
- [ ] <!-- image version 2 -->

#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->

<!--
 Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
-->

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->

#### A picture of a cute animal (not mandatory but encouraged)
BYGX-wcr pushed a commit to BYGX-wcr/sonic-buildimage that referenced this issue Jan 21, 2025
…iple ptf nn agents connection (sonic-net#21070)

When testing sonic with ptf dataplane connecting multiple ptf nn agents, some cases will fail because of packets queue in ptf were not polled thoroughly. This is a bug or missing feature in ptf: p4lang/ptf#207
as a short term quick fix, this PR will patch the ptf-py3 package and unblock our qualification process
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants