Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code update #21

Open
wants to merge 7,907 commits into
base: master
Choose a base branch
from
Open

Code update #21

wants to merge 7,907 commits into from

Conversation

tiantianlv
Copy link
Owner

- What I did

- How I did it

- How to verify it

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

oleksandrivantsiv and others added 21 commits July 11, 2024 20:20
- Why I did it
Add the possibility to compile BFB (BlueField boot stream) image for the nvidia-bluefield platform.
The BFB image fully overwrites the DPU's SSD and can be used for DPU recovery.

- How I did it
Add new bfb image type to build_image.sh script. Add scripts under the "nvidia-bluefield" platform to create the binary file.

- How to verify it
Configure nvidia-bluefield platform. Run make target/sonic-nvidia-bluefield.bfb command to compile the image.
- Why I did it
To add support for the arm64-nvda_bf-bf3comdpu platform

- How I did it
Added arm64-nvda_bf-bf3comdpu directory to the devices

Signed-off-by: Yakiv Huryk <[email protected]>
Why I did it
In #15058, the NTP server configuration was modified to add additional options, such as conditionally enabling iburst, specifying the association type, and specifying the NTP version. One side effect of this was that the iburst option which was previously always enabled now requires it to be explicitly enabled in the config_db.

Fixes #19425.

How I did it
To restore the old behavior, when loading from minigraph, add "iburst": "true" for each NTP server loaded from minigraph.

How to verify it
Tested on a KVM setup, verified that the generated ntp.conf file had the iburst option.

Signed-off-by: Saikrishna Arcot <[email protected]>
… instead of 'voq' and fix few echo msgs. (#19527)

use 'SpineRouter' in checks instead of 'voq' and fix few echo msgs.

Signed-off-by: fountzou <[email protected]>
…utomatically (#19415)

#### Why I did it
src/sonic-host-services
```
* 02d9b55 - (HEAD -> master, origin/master, origin/HEAD) Added support to render template format of `delayed` flag on Feature Table. (#135) (28 hours ago) [abdosi]
* 60fdfea - Fixed determine/process reboot-cause service dependency (#17406) (#132) (13 days ago) [anamehra]
```
#### How I did it
#### How to verify it
#### Description for the changelog
In this PR, we add a new PR checker called onboarding dualtor. This new PR checker is for onboarding dualtor speific test scripts.
Why I did it
Revert BGP suppress FIB pending due to unresolved FRR issues in current version

Work item tracking
Microsoft ADO (number only):
How I did it
Revert it

How to verify it
Build and run
…tically (#19459)

#### Why I did it
src/sonic-dash-api
```
* 5809048 - (HEAD -> master, origin/master, origin/HEAD) [build]Update pool sonicbld to sonic-ububtu-1c since it's deprecated (#21) (10 days ago) [Jianquan Ye]
```
#### How I did it
#### How to verify it
#### Description for the changelog
*Fix: #18818

Handle any exception in API get_service_from_feature_table() gracefully .

---------

Signed-off-by: Abhishek Dosi <[email protected]>
…utomatically (#19568)

#### Why I did it
src/sonic-host-services
```
* cfb3cb8 - (HEAD -> master, origin/master, origin/HEAD) Modified scripts: replaced deprecated 'logger' with 'syslogger' (#136) (2 hours ago) [Ashwin Srinivasan]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…tomatically (#19548)

#### Why I did it
src/sonic-linux-kernel
```
* 88c1826 - (HEAD -> master, origin/master, origin/HEAD) [Micas] Kernel configuration is enabled to support mdio-gpio and spi-gpio (#411) (27 hours ago) [Philo]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…atically (#19522)

#### Why I did it
src/sonic-utilities
```
* b6f7c2b7 - (HEAD -> master, origin/master, origin/HEAD) [sfputil] Add loopback sub-command for debugging and module diagnosti… (#3369) (27 hours ago) [Xinyu Lin]
* 1f944447 - Fix multi-asic behaviour for pg-drop (#3058) (2 days ago) [bktsim]
* 789ef634 - Add Parallel option for apply-patch (#3373) (3 days ago) [Xincun Li]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…lly (#19513)

#### Why I did it
src/sonic-gnmi
```
* a85dfc1 - (HEAD -> master, origin/master, origin/HEAD) Subscribe COUNTERS_DB (#268) (4 days ago) [ganglv]
* 1e90d23 - Update gnmi-native to support subscribe poll mode (#267) (5 days ago) [ganglv]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#19458)

#### Why I did it
src/linkmgrd
```
* 6fcab47 - (HEAD -> master, origin/master, origin/HEAD) [active-standby] raising log level to notice for timed oscillation config change (#262) (28 hours ago) [Jing Zhang]
* 2b7e4f9 - [active-standby] Fix the oscillation logic (#261) (11 days ago) [Longxiang Lyu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…9486)

* make asic_name case sensitive

* address comment
Why I did it
After got running process list by psutil, if some of the processes exited, but invoke name() / cmdline() function of them, it will raise NoSuchProcess exception
Fix issue #19507

How I did it
Add try/exception in process execution
Change func get_target_process_cmds to get_target_process for reuse

How to verify it
UT passed
Why I did it
[Security] Fix krb5 CVE-2024-37370

Work item tracking
Microsoft ADO (number only): 28432951
How I did it
Upgrade krb5 to version 1.18.3-6+deb11u5+fips
Why I did it
This PR updates the MMU related configurations on Arista 7060X6-PE device with 256x200G breakout.

Work item tracking
Microsoft ADO (number only): 28707303
How I did it
This PR updates 3 things:

- Updated bcm file for optimal MMU settings.
- Updated buffer defaults to accommodate the TH5 architecture with 1 ingress pool + 1 egress pool and updated to optimal value.
- Updated PG lookups for buffer setups.

How to verify it

Tested with sonic-mgmt tests with xoff/xon tests with updated QoS parameter: https://github.com/sonic-net/sonic-mgmt/pull/13656/files
Local ixia test is passing in lab.
Both verified using 202311 branch for backporting.
Why I did it
Upgrade the xgs SAI version to 10.1.35.0 to include the following changes:

10.1.21.0: Update the ACL impelementation to enable the ACL switch bind to support PFCWD on MACSEC devices
10.1.23.0: Handle the FDR stats get not supported more properly.
10.1.24.0: (CSP CS00012316286) Fix LPM Miss ACL with counters fails
10.1.25.0: Improve ECMP performance
10.1.28.0: Fix SAI is not honoring the bcm_linkscan_interval set in .bcm config file for Ramon and error with setting SAI_NEIGHBOR_ENTRY_ATTR_IS_LOCAL attribute for neighbor
10.1.29.0: Fix fabric switch initialization delayed if SW linkscan enabled during platform init
10.1.30.0: (CSP CS00012356911) Fix traffic drops on all priorities when PFC asserted on two priorities.
10.1.31.0: NHG performance fix
10.1.33.0: (CSP CS00012345242) High CPU due to SAI code calls bcm_port_resource_get for every fabric error counter
10.1.35.0: Support MMU configuration updates for TH5.

Work item tracking
Microsoft ADO (number only): 28724885
#### Why I did it
Update sonic-snmpagent submodule to include below commit:
a281f9a [ciscoPfcExtMIB]: Remove returning first intf index if subid is empty (#322)
d532923 Modify path of python-wheels package to use bookworm (#324)
…lly (#19571)

#### Why I did it
src/sonic-gnmi
```
* 015de94 - (HEAD -> master, origin/master, origin/HEAD) Update gnmi-native to support subscribe stream mode (#271) (2 days ago) [ganglv]
* ccce9a2 - Return GNMI API error when ZMQ operation failed. (#270) (2 days ago) [Hua Liu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
mssonicbld and others added 30 commits September 21, 2024 19:01
…lly (#20310)

#### Why I did it
src/sonic-swss
```
* 9a6a86fe - (HEAD -> master, origin/master, origin/HEAD) Add FEC correctable bit error count to ports_stat_ids (#3291) (28 hours ago) [Prince George]
* 5126f88b - Fix the SAI status check for unsupported API on SAI_SWITCH_ATTR_SUPPO… (#3282) (32 hours ago) [DavidZagury]
```
#### How I did it
#### How to verify it
#### Description for the changelog
## How I did it

Remove eventd enabled/slim image check from Dockerfile (build time). As part of Dockerfile expose eventd_enabled and slim image flags to ENV which will be used docker_init/start.sh to check if rsyslog plugin should be moved to rsyslog.d

#### How to verify it

Manual test/pipeline
### Why I did it

#19179 removed call to publish_events when memory usage container exceeds threshold, causing test_events to fail.

### How I did it

Add back call to publish_events

#### How to verify it

Manual test
- Why I did it
Booting master/202405 images on SN5400 and SN5600 platforms is resulting the log

2024-07-23T21:01:07.972045+00:00 sonic kernel: [    0.110657] x86/cpu: SGX disabled by BIOS.
2024-07-23T21:01:07.972045+00:00 sonic kernel: [    0.110657] x86/cpu: SGX disabled by BIOS.
Processor on these systems supports the SGX but the BIOS does not. SGX feature is also not important and thus disable using the nosgx kernel boot parameter

- How I did it

- How to verify it
Added the parameter, reboot and check for log
…ion (#20085)

[Mellanox] Update asic table template for shared headroom pool relevant information 

Signed-off-by: Stephen Sun <[email protected]>
…restarts (#20291)

Why I did it
When swss is restarted, let's cleanup the state db tunnel and decap term entries.

Signed-off-by: Longxiang Lyu [email protected]

Work item tracking
Microsoft ADO (number only): 29507113

How I did it
flush the state db tunnel and decap term entries.

Signed-off-by: Longxiang Lyu <[email protected]>
- Why I did it
Introduce new Mellanox SIMX platform SN5640.
Number of up/down links - 256/256.
Default cable length for t0 - 40m.
Default cable length for t1 - 300m.
Default speed - 100G.

- How I did it
Added all relevant files of new SKU used for ASIC simulation

- How to verify it
Check that the SKU is up and running on simulation environment
* Update DNX SAI to 11.2.9.1
* Update sai-modules.mk

[SAI_BRANCH rel_ocp_sai_11_2] [CSP CS00012352844] Backport SONIC-92490 to rel_ocp_sai_11_2

JIRA# SONIC-92490

Issue Summary: Added missing DoNOtLearn action to knet port trap
- Why I did it
To add a possibility to disable SONiC containers and run a user-provided data-plane application.

- How I did it
Added Nvidia platform-specific script sonic-byo.py

- How to verify it
Manual test

---------

Signed-off-by: Yakiv Huryk <[email protected]>
Why I did it
Fix permission issue on docker-ptf /var/run/sshd.
sonic-net/sonic-mgmt#13545

Work item tracking
Microsoft ADO (number only): 29632874
How I did it
How to verify it
Check docker-ptf in this build. https://dev.azure.com/mssonic/build/_build/results?buildId=653508&view=results
… automatically (#20371)

#### Why I did it
src/sonic-platform-common
```
* daeed65 - (HEAD -> master, origin/master, origin/HEAD) Added new Platform APIs and modified APIs for supporting reboot on a SmartSwitch (#501) (9 hours ago) [Vasundhara Volam]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…lly (#20339)

#### Why I did it
src/sonic-swss
```
* 3c230d2b - (HEAD -> master, origin/master, origin/HEAD) [Orchagent] Add optional create_switch timeout parameter (#3258) (3 days ago) [Pavan Naregundi]
* be3a15f6 - Update CODEOWNERS VNET and ACL Orch (#3296) (3 days ago) [siqbal1986]
* 002cd256 - Close socket descriptor in checkPortIffUp. (#3263) (3 days ago) [mint570]
* 69cf0872 - [orchagent]: Skip installing ACL counter when ACL mirror rule is inactive (#3223) (3 days ago) [fountzou]
* 008f2865 - [crm][dash] Do not probe DASH resources on devices other than the DPU. (#3297) (3 days ago) [Oleksandr Ivantsiv]
* 971dfc1a - Add support for PACKET_ACTION_COPY (#3288) (4 days ago) [Devesh Pathak]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…ning (#19947)

#### Why I did it
We encounter an error in the log, which came from the call to logrotate while rsyslog is not running.

### How I did it
Config logrotate to check that the HUP signal will only send when rsyslog is running.
…atically (#20344)

#### Why I did it
src/sonic-utilities
```
* 94ec7108 - (HEAD -> master, origin/master, origin/HEAD) Enhance multi-asic support for queuestat (#3554) (29 hours ago) [HP]
* 688c1d1a - [dpu_tty]: Add a DPU TTY console utility (#3535) (3 days ago) [Wenchung Wang]
* b8f306f3 - [Nokia] Add J2C+/H3/H4/H5 to GCU validator (#3495) (3 days ago) [Dylan Godwin]
* 695cc9a7 - Upgrade pyroute2 and improve cli response time (#3513) (4 days ago) [Vivek]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Signed-off-by: [email protected]

Cisco platform 202405.0.5 release

Why I did it
Cisco platform 202405.0.5 release

Work item tracking
Microsoft ADO (number only):
- Why I did it
Integrate HW-MGMT 7.0040.1011 Changes

- How I did it
Run make integrate-mlnx-hw-mgmt

- How to verify it
Build an image and run tests from "sonic-mgmt".
…8715)

[FRR FPM] Introduce new FRR-SONiC communication channel (FPM SONiC module).

Signed-off-by: Carmine Scarpitta <[email protected]>
…atically (#20392)

#### Why I did it
src/sonic-utilities
```
* 66b41e5f - (HEAD -> master, origin/master, origin/HEAD) [fast/warm-reboot] Improve retry mechanism to check if SAI_OBJECT_TYPE_ACL_ENTRY entries are in redis (#3548) (10 hours ago) [Andriy Yurkiv]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…omatically (#20374)

#### Why I did it
src/sonic-swss-common
```
* 898aa5d - (HEAD -> master, origin/master, origin/HEAD) Add VRF support to ZMQ server/client (#920) (26 hours ago) [Hua Liu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…D automatically (#20403)

#### Why I did it
src/sonic-platform-daemons
```
* 604e454 - (HEAD -> master, origin/master, origin/HEAD) Improve parsing of media-settings.json for non-CMIS and breakout ports (#533) (22 hours ago) [longhuan-cisco]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…tomatically (#20045)

#### Why I did it
src/sonic-linux-kernel
```
* b2f73b6 - (HEAD -> master, origin/master, origin/HEAD) include adm1275 config within kconfig for usage on cisco platform (#432) (35 hours ago) [Gregory Boudreau]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…atically (#20411)

#### Why I did it
src/sonic-utilities
```
* 008a078a - (HEAD -> master, origin/master, origin/HEAD) Add Unit Test for portstat (#3564) (5 hours ago) [Changrong Wu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…tically (#20410)

#### Why I did it
src/sonic-sairedis
```
* 24843d41 - (HEAD -> master, origin/master, origin/HEAD) [Mellanox] Resolve New Line Formatting Issues in syncd's sai.profile (#1412) (14 hours ago) [Tomer Shalvi]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…omatically (#20409)

#### Why I did it
src/sonic-mgmt-common
```
* b91a4df - (HEAD -> master, origin/master, origin/HEAD) PortChannel Interface Static Support - OpenConfig Yang (#142) (9 hours ago) [Satoru Shinohara]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
The system_port_id generation was based on the loop below

for interface in **interface_metadata.findall**(str(QName(ns1, "DeviceInterfaceMetadata"))):

"DeviceInterfaceMetadata" defined in the minigraph in DeviceInfo section, which is per interface and in this loop we increment the system_port_id++ so that each interface will have a unique ID. 

The for loop was based on interface_metadata list extracted by findall() API matching tag **"DeviceInterfaceMetadata"** [lxml.etree._Element](https://lxml.de/api/lxml.etree._Element-class.html). findall() doesn't guarantee document order. Hence the interface list and the corresponding system_port ids generated - has a possibility of not matching across config_db's in different linecards of a chassis.

When SYSTEM_PORT table entries hve mismatch across linecards, a few line cards behaving erratically, resulting in continuous pkt error interrupts getting fired and the IBGP sessions not getting established with other peer ASIC's in other line cards.

### How I did it
Add logic to do a sort of the system_ports dictionary based on the key (eg: "str-sonic-lc03|ASIC0|Ethernet120") and assign the system_port_id in an incremental way. 

This makes sure the system_port_ids in SYSTEM_PORT table in config_db matches in all linecards/asic

Thanks to @abdosi and @vmittal-msft in triaging and coming to this solution.

#### How to verify it

Verified by manually patching this logic in the minigraph parser in the sonic T2 chassis and make sure the dockers, interfaces, IBGP, EBGP comes up in all linecards across the chassis.
### Why I did it

Fix #20048

It is explained in ticket above, how sonic-cfggen calls in the hostname.sh, pcie-check.sh and banner-config and hogging the CPU and slightly delaying the start of swss

**pcie-check.sh**
redis-cli call is also replaced with sonic-db-cli since redis-cli is a wrapper around the actual redis-cli under database container
<img width="1113" alt="image" src="https://github.com/user-attachments/assets/bed9d055-3b9f-4f75-aabd-1e0335716396">

**swss start**
If the SKU has create_only_config_db_buffers.json set to true, the only config that's updated today is. 
```
{
    "DEVICE_METADATA": {
        "localhost": {
            "create_only_config_db_buffers": "true"
        }
    }
}
```
We use sonic-cfggen which cause 1.5 sec delay in the start of swss container. Thus replace with sonic-db-cli. If a complex use case arises in future, this can be updated.
<img width="1483" alt="Screenshot 2024-09-16 at 12 45 05 PM" src="https://github.com/user-attachments/assets/ee3248b5-7623-42c4-9b50-81b114c71ae7">


### How I did it

#### How to verify it

**Note: Everything was tested on MSN2700 device, Intel Celeron CPU with 2 cores**

- 1.5 sec saved in the swss container start
<img width="1298" alt="image" src="https://github.com/user-attachments/assets/817739f1-26a2-41ad-89e4-1e76e31532ac">

- 1.6 sec saved in the start of swss service. Previously, it took almost 1.8 sec after config-setup is finished for the swss to start. After replacing the calls, it start almost 0.2 sec after config-setup is finished

In total, anywhere between 2.5 - 3.5 sec is saved
Added Dynamic port breakout support for S5248F DellEMC platform

How I did it
Modified the default hwsku profile for DPB support and updated the platform.json file with breakout options.
Baseline implementation for 256x100g support in the Arista-7060X6-64PE (DCS-7060X6-64PE) has been requested by MSFT.

How I did it
Added the necessary files for baseline implementation of 256x100g implementation for Quicksilver OSFP.
…lly (#20424)

#### Why I did it
src/sonic-swss
```
* 8b99543f - (HEAD -> master, origin/master, origin/HEAD) Fix portmgr write partial port config to app DB issue. (#3304) (33 hours ago) [Hua Liu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.