Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VOQ] Fabric orchagent exit in Supervisor #15321

Closed
judyjoseph opened this issue Jun 3, 2023 · 18 comments
Closed

[VOQ] Fabric orchagent exit in Supervisor #15321

judyjoseph opened this issue Jun 3, 2023 · 18 comments
Assignees
Labels
Chassis 🤖 Modular chassis support chassis-voq Voq chassis changes Issue for 202205 P0 Priority of the issue Triaged this issue has been triaged

Comments

@judyjoseph
Copy link
Contributor

judyjoseph commented Jun 3, 2023

Description

Orchagent controlling the fabric asic exit seen on Nokia chassis supervisor due to TIMEOUT error. This is seen on a chassis with all the fabric cards inserted.

The CPU is high and continuous logs are seen in syslog "get:SAI_OBJECT_TYPE_PORT"

Steps to reproduce the issue:

  1. Boot the chassis, observe it in the supervisor.

Describe the results you received:

May 30 07:54:10.454677 svcstr--sup-1 ERR syncd2#syncd: :- threadFunction: time span WD exceeded 30283 ms for SET:FABRIC_PORT_STAT_COUNTER:oid:0x1000000000181
May 30 07:54:10.454702 svcstr--sup-1 ERR syncd2#syncd: :- logEventData: op: SET, key: FABRIC_PORT_STAT_COUNTER:oid:0x1000000000181
May 30 07:54:10.454702 svcstr--sup-1 ERR syncd2#syncd: :- logEventData: fv: PORT_COUNTER_ID_LIST: SAI_PORT_STAT_IF_OUT_FABRIC_DATA_UNITS,SAI_PORT_STAT_IF_IN_FEC_SYMBOL_ERRORS,SAI_PORT_STAT_IF_IN_FEC_NOT_CORRECTABLE_FRAMES,SAI_PORT_STAT_IF_IN_FEC_CORRECTABLE_FRAMES,SAI_PORT_STAT_IF_OUT_OCTETS,SAI_PORT_STAT_IF_IN_FABRIC_DATA_UNITS,SAI_PORT_STAT_IF_IN_ERRORS,SAI_PORT_STAT_IF_IN_OCTETS
May 30 07:54:24.705958 svcstr--sup-1 ERR syncd9#syncd: :- threadFunction: time span WD exceeded 30273 ms for SET:FABRIC_PORT_STAT_COUNTER:oid:0x1000000000184
May 30 07:54:24.705958 svcstr--sup-1 ERR syncd9#syncd: :- logEventData: op: SET, key: FABRIC_PORT_STAT_COUNTER:oid:0x1000000000184
May 30 07:54:24.705958 svcstr--sup-1 ERR syncd9#syncd: :- logEventData: fv: PORT_COUNTER_ID_LIST: SAI_PORT_STAT_IF_OUT_FABRIC_DATA_UNITS,SAI_PORT_STAT_IF_IN_FEC_SYMBOL_ERRORS,SAI_PORT_STAT_IF_IN_FEC_NOT_CORRECTABLE_FRAMES,SAI_PORT_STAT_IF_IN_FEC_CORRECTABLE_FRAMES,SAI_PORT_STAT_IF_OUT_OCTETS,SAI_PORT_STAT_IF_IN_FABRIC_DATA_UNITS,SAI_PORT_STAT_IF_IN_ERRORS,SAI_PORT_STAT_IF_IN_OCTETS
May 30 07:54:28.366243 svcstr--sup-1 ERR syncd9#syncd: [-bdb:5:1] SAI_API_PORT:_brcm_sai_port_wred_stats_get:15065 port gport get failed with error Feature unavailable (0xfffffff0).
May 30 07:54:28.366449 svcstr--sup-1 ERR syncd9#syncd: [-bdb:5:1] SAI_API_PORT:brcm_sai_get_port_stats:5187 port wred stats get failed with error -2. 
May 30 07:54:28.366508 svcstr--sup-1 ERR syncd9#syncd: [-bdb:5:1] SAI_API_PORT:_brcm_sai_port_wred_stats_get:15065 port gport get failed with error Feature unavailable (0xfffffff0).
May 30 07:54:28.366557 svcstr--sup-1 ERR syncd9#syncd: [-bdb:5:1] SAI_API_PORT:brcm_sai_get_port_stats:5187 port wred stats get failed with error -2. 
May 30 07:54:28.685003 svcstr--sup-1 ERR syncd9#syncd: :- setEndTime: event 'SET:FABRIC_PORT_STAT_COUNTER:oid:0x1000000000184' took 34253 ms to execute
May 30 07:54:28.685003 svcstr--sup-1 ERR syncd9#syncd: :- logEventData: op: SET, key: FABRIC_PORT_STAT_COUNTER:oid:0x1000000000184
May 30 07:54:28.685003 svcstr--sup-1 ERR syncd9#syncd: :- logEventData: fv: PORT_COUNTER_ID_LIST: SAI_PORT_STAT_IF_OUT_FABRIC_DATA_UNITS,SAI_PORT_STAT_IF_IN_FEC_SYMBOL_ERRORS,SAI_PORT_STAT_IF_IN_FEC_NOT_CORRECTABLE_FRAMES,SAI_PORT_STAT_IF_IN_FEC_CORRECTABLE_FRAMES,SAI_PORT_STAT_IF_OUT_OCTETS,SAI_PORT_STAT_IF_IN_FABRIC_DATA_UNITS,SAI_PORT_STAT_IF_IN_ERRORS,SAI_PORT_STAT_IF_IN_OCTETS
May 30 07:54:28.709385 svcstr--sup-1 ERR syncd9#syncd: [-bdb:5:1] SAI_API_PORT:_brcm_sai_port_wred_stats_get:15065 port gport get failed with error Feature unavailable (0xfffffff0).
May 30 07:54:28.709436 svcstr--sup-1 ERR syncd9#syncd: [-bdb:5:1] SAI_API_PORT:brcm_sai_get_port_stats:5187 port wred stats get failed with error -2. 
May 30 07:54:28.709487 svcstr--sup-1 ERR syncd9#syncd: [-bdb:5:1] SAI_API_PORT:_brcm_sai_port_wred_stats_get:15065 port gport get failed with error Feature unavailable (0xfffffff0).
May 30 07:54:28.709539 svcstr--sup-1 ERR syncd9#syncd: [-bdb:5:1] SAI_API_PORT:brcm_sai_get_port_stats:5187 port wred stats get failed with error -2. 
May 30 07:54:35.872260 svcstr--sup-1 ERR syncd0#syncd: :- threadFunction: time span WD exceeded 30969 ms for SET:FABRIC_PORT_STAT_COUNTER:oid:0x1000000000182
May 30 07:54:35.872316 svcstr--sup-1 ERR syncd0#syncd: :- logEventData: op: SET, key: FABRIC_PORT_STAT_COUNTER:oid:0x1000000000182
May 30 07:54:35.872368 svcstr--sup-1 ERR syncd0#syncd: :- logEventData: fv: PORT_COUNTER_ID_LIST: SAI_PORT_STAT_IF_OUT_FABRIC_DATA_UNITS,SAI_PORT_STAT_IF_IN_FEC_SYMBOL_ERRORS,SAI_PORT_STAT_IF_IN_FEC_NOT_CORRECTABLE_FRAMES,SAI_PORT_STAT_IF_IN_FEC_CORRECTABLE_FRAMES,SAI_PORT_STAT_IF_OUT_OCTETS,SAI_PORT_STAT_IF_IN_FABRIC_DATA_UNITS,SAI_PORT_STAT_IF_IN_ERRORS,SAI_PORT_STAT_IF_IN_OCTETS
May 30 07:54:38.298868 svcstr--sup-1 ERR syncd0#syncd: :- setEndTime: event 'SET:FABRIC_PORT_STAT_COUNTER:oid:0x1000000000182' took 33398 ms to execute
May 30 07:54:38.300775 svcstr--sup-1 ERR syncd0#syncd: :- logEventData: op: SET, key: FABRIC_PORT_STAT_COUNTER:oid:0x1000000000182
May 30 07:54:38.863110 svcstr--sup-1 ERR syncd0#syncd: [-bdb:1:0] SAI_API_PORT:_brcm_sai_port_wred_stats_get:15065 port gport get failed with error Feature unavailable (0xfffffff0).
May 30 07:54:38.863207 svcstr--sup-1 ERR syncd0#syncd: [-bdb:1:0] SAI_API_PORT:brcm_sai_get_port_stats:5187 port wred stats get failed with error -2. 
May 30 07:54:38.863306 svcstr--sup-1 ERR syncd0#syncd: [-bdb:1:0] SAI_API_PORT:_brcm_sai_port_wred_stats_get:15065 port gport get failed with error Feature unavailable (0xfffffff0).
May 30 07:54:38.863397 svcstr--sup-1 ERR syncd0#syncd: [-bdb:1:0] SAI_API_PORT:brcm_sai_get_port_stats:5187 port wred stats get failed with error -2. 
May 30 07:54:40.229355 svcstr--sup-1 ERR swss2#orchagent: :- wait: SELECT operation result: TIMEOUT on getresponse
May 30 07:54:40.229481 svcstr--sup-1 ERR swss2#orchagent: :- wait: failed to get response for getresponse
May 30 07:54:43.353386 svcstr--sup-1 ERR syncd2#syncd: :- setEndTime: event 'SET:FABRIC_PORT_STAT_COUNTER:oid:0x1000000000181' took 63181 ms to execute
May 30 07:54:43.353507 svcstr--sup-1 ERR syncd2#syncd: :- logEventData: op: SET, key: FABRIC_PORT_STAT_COUNTER:oid:0x1000000000181
May 30 07:54:43.353566 svcstr--sup-1 ERR syncd2#syncd: :- logEventData: fv: PORT_COUNTER_ID_LIST: SAI_PORT_STAT_IF_OUT_FABRIC_DATA_UNITS,SAI_PORT_STAT_IF_IN_FEC_SYMBOL_ERRORS,SAI_PORT_STAT_IF_IN_FEC_NOT_CORRECTABLE_FRAMES,SAI_PORT_STAT_IF_IN_FEC_CORRECTABLE_FRAMES,SAI_PORT_STAT_IF_OUT_OCTETS,SAI_PORT_STAT_IF_IN_FABRIC_DATA_UNITS,SAI_PORT_STAT_IF_IN_ERRORS,SAI_PORT_STAT_IF_IN_OCTETS
Jun  3 01:49:29.701348 svcstr--sup-1 NOTICE syncd0#syncd: :- threadFunction: time span 50 ms for 'get:SAI_OBJECT_TYPE_PORT:oid:0x1000000000137'
Jun  3 01:49:29.713273 svcstr--sup-1 NOTICE syncd13#syncd: :- threadFunction: time span 345 ms for 'get:SAI_OBJECT_TYPE_PORT:oid:0x100000000011b'
Jun  3 01:49:29.789858 svcstr--sup-1 NOTICE syncd15#syncd: :- threadFunction: time span 0 ms for 'get:SAI_OBJECT_TYPE_PORT:oid:0x1000000000142'
Jun  3 01:49:29.817874 svcstr--sup-1 NOTICE syncd1#syncd: :- threadFunction: time span 264 ms for 'get:SAI_OBJECT_TYPE_PORT:oid:0x1000000000125'
Jun  3 01:49:29.866492 svcstr--sup-1 NOTICE syncd10#syncd: :- threadFunction: time span 137 ms for 'get:SAI_OBJECT_TYPE_PORT:oid:0x100000000012c'
Jun  3 01:49:29.954517 svcstr--sup-1 NOTICE syncd8#syncd: :- threadFunction: time span 114 ms for 'get:SAI_OBJECT_TYPE_PORT:oid:0x100000000012f'
Jun  3 01:49:30.060723 svcstr--sup-1 NOTICE syncd7#syncd: :- threadFunction: time span 365 ms for 'get:SAI_OBJECT_TYPE_PORT:oid:0x1000000000117'
Jun  3 01:49:30.152688 svcstr--sup-1 NOTICE syncd2#syncd: :- threadFunction: time span 15 ms for 'get:SAI_OBJECT_TYPE_PORT:oid:0x1000000000131'
Jun  3 01:49:30.296935 svcstr--sup-1 NOTICE syncd3#syncd: :- threadFunction: time span 81 ms for 'get:SAI_OBJECT_TYPE_PORT:oid:0x100000000013b'
Jun  3 01:49:30.353488 svcstr--sup-1 NOTICE syncd11#syncd: :- threadFunction: time span 25 ms for 'get:SAI_OBJECT_TYPE_PORT:oid:0x100000000012e'
Jun  3 01:49:30.404254 svcstr--sup-1 NOTICE syncd14#syncd: :- threadFunction: time span 166 ms for 'get:SAI_OBJECT_TYPE_PORT:oid:0x1000000000100'
Jun  3 01:49:30.409476 svcstr--sup-1 NOTICE syncd6#syncd: :- threadFunction: time span 191 ms for 'get:SAI_OBJECT_TYPE_PORT:oid:0x10000000000f6'
Jun  3 01:49:30.467450 svcstr--sup-1 NOTICE syncd12#syncd: :- threadFunction: time span 43 ms for 'get:SAI_OBJECT_TYPE_PORT:oid:0x1000000000104'
Jun  3 01:49:30.504442 svcstr--sup-1 NOTICE syncd5#syncd: :- threadFunction: time span 57 ms for 'get:SAI_OBJECT_TYPE_PORT:oid:0x100000000013d'
Jun  3 01:49:30.574298 svcstr--sup-1 NOTICE syncd4#syncd: :- threadFunction: time span 34 ms for 'get:SAI_OBJECT_TYPE_PORT:oid:0x10000000000e4'

Describe the results you expected:

Output of show version:

show version 

SONiC Software Version: SONiC.C.20220531.27.05
SONiC OS Version: 11
Distribution: Debian 11.7
Kernel: 5.10.0-18-2-amd64
Build commit: 9e776925c2
Build date: Wed May 24 21:18:30 UTC 2023
Built by: cloudtest@8c5b0374c000000

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@judyjoseph judyjoseph added chassis-voq Voq chassis changes Chassis 🤖 Modular chassis support labels Jun 3, 2023
@judyjoseph
Copy link
Contributor Author

Adding more logs. With FABRIC poll, TIMEOUTs seen on random swss/.fabric asic on SUP on a fully populated chassis

2023-06-05T02:59:22.0269718Z E               Jun  5 02:52:20.310548 svcstr--sup-1 ERR syncd10#syncd: :- setEndTime: event 'SET:FABRIC_PORT_STAT_COUNTER:oid:0x1000000000169' took 30378 ms to execute
2023-06-05T02:59:22.0270480Z E               
2023-06-05T02:59:22.0271485Z E               Jun  5 02:52:20.310653 svcstr--sup-1 ERR syncd10#syncd: :- logEventData: op: SET, key: FABRIC_PORT_STAT_COUNTER:oid:0x1000000000169
2023-06-05T02:59:22.0272268Z E               
2023-06-05T02:59:22.0273565Z E               Jun  5 02:52:20.310708 svcstr--sup-1 ERR syncd10#syncd: :- logEventData: fv: PORT_COUNTER_ID_LIST: SAI_PORT_STAT_IF_OUT_FABRIC_DATA_UNITS,SAI_PORT_STAT_IF_IN_FEC_SYMBOL_ERRORS,SAI_PORT_STAT_IF_IN_FEC_NOT_CORRECTABLE_FRAMES,SAI_PORT_STAT_IF_IN_FEC_CORRECTABLE_FRAMES,SAI_PORT_STAT_IF_OUT_OCTETS,SAI_PORT_STAT_IF_IN_FABRIC_DATA_UNITS,SAI_PORT_STAT_IF_IN_ERRORS,SAI_PORT_STAT_IF_IN_OCTETS

@judyjoseph
Copy link
Contributor Author

judyjoseph commented Jun 5, 2023

@saksarav-nokia @mlok-nokia to check.

@judyjoseph
Copy link
Contributor Author

#define FABRIC_PORT_STAT_COUNTER_FLEX_COUNTER_GROUP "FABRIC_PORT_STAT_COUNTER"
#define FABRIC_PORT_STAT_FLEX_COUNTER_POLLING_INTERVAL_MS 10000
#define FABRIC_QUEUE_STAT_COUNTER_FLEX_COUNTER_GROUP "FABRIC_QUEUE_STAT_COUNTER"
#define FABRIC_QUEUE_STAT_FLEX_COUNTER_POLLING_INTERVAL_MS 100000

The fabric polling interval is very aggressive considering every 10sec we poll for all fabric ports

@judyjoseph judyjoseph self-assigned this Jun 7, 2023
@judyjoseph judyjoseph added the Triaged this issue has been triaged label Jun 7, 2023
@judyjoseph
Copy link
Contributor Author

I find two areas which will need optimization and fix

  1. In this J2C+ linecards, longer delay is seen in get/set operations on Fabric ports. We can follow up with Broadcom on this @saksarav-nokia

  2. In the orchagent/fabric orch
    (i) The counter poll is very aggressive, https://github.com/sonic-net/sonic-swss/blob/7702466076f2998eceb86476595966a9cfea9a4d/orchagent/fabricportsorch.cpp#L19, queue stats for all interfaces every 10sec

(ii) The call to updateFabricPortState() is a heavy call, and here it is redundant https://github.com/sonic-net/sonic-swss/blob/master/orchagent/fabricportsorch.cpp#L360. The is because we already call updateFabricPortState() towards end of API getFabricPortList() which is called in doTask().
@ngoc-do Could you check please

@judyjoseph
Copy link
Contributor Author

@arlakshm f.y.i

@rlhui rlhui added the P0 Priority of the issue label Jun 13, 2023
@saksarav-nokia
Copy link
Contributor

@judyjoseph @arlakshm,
we analyzed the issue with fabric port stats polling and following is our findings.

  1. We have 16 Ramons and 192 fabric ports in each Ramon.

  2. When the supervisor is rebooted or config reload is done, the swss and syncd dockers are started and switch_create is called for each Ramon.

  3. As soon as the switch_create is completed for a given swss/syncd, the fabric port stats polling is started from first fabric port and polled for every port one by one. But since the cpu is very busy with creating switch for all 16 swss/syncds and also the polling interval is 10000, the polling cycle is never completed for all 192 ports and we see the sai api call get_port_stats is invoked only for first set of fabric ports. When it is in the middle of polling, the next polling interval starts and the previous polling is interrupted, the polling starts from the very first port again. This process continues till config reload process is complete and all the swss/syncd dockers are up and running which takes ~5 minutes. After this i see the polling is done every 30secs (is this FABRIC_POLLING_INTERVAL_DEFAULT) and all 192 ports are polled.

  4. We see sai api call get_port_stats to read all 8 fabric port stats for a given fabric ports takes only few ms in normal state and also during boot and config reload.

  5. "threadFunction: time span" with higher values are printed for ports which are missed in polling during in completed polling cycles mentioned in bullet 3. Also we noticed that this time span messaged is printed with time value 0 and this needs to be addressed as well in swss/syncd.

So the only way to address this issue to optimize aggressive polling during bootup or config reload.

Thanks,
Sakthi

@saksarav-nokia
Copy link
Contributor

cpm_syslog.log
Attached the syslog taken during config reload used for above analysis. Same behavior is seen during boot up.

@arlakshm
Copy link
Contributor

@kenneth-arista, please take a look at this issue

@saksarav-nokia
Copy link
Contributor

@kenneth-arista ,
we see that it takes ~30secs to poll all 192 counters in each polling interval. We increased FABRIC_PORT_STAT_FLEX_COUNTER_POLLING_INTERVAL_MS to 60secs and this seems to be helping a lot. Except the very first polling right after the config reload, all other polling cycles are completed for all 192 ports and each polling is completed in 30secs.

But looks like there is another issue with the fabric port counter. Even though all 192 ports are polled in every polling cycle and the duration to poll all 8 counters for each port is ~0.1 secs or less, we still see "syncd0#syncd: :- threadFunction: time span" logs for random few ports keeps in every polling cycle. When would we see this?.

@kenneth-arista
Copy link
Contributor

@saksarav-nokia can you paste the output of show fabric reachability for your system?

I believe setting FABRIC_PORT_STAT_FLEX_COUNTER_POLLING_INTERVAL_MS to 60 secs to too slow and will negatively affect fabric link monitoring functionality.

@judyjoseph is correct in that there is a redundant call to updateFabricPortState() at the end of FabricPortsOrch::getFabricPortList(). Let's remove this.

We're gathering some data on our end. As a quick datapoint, we don't see orchagent restarts with config-reload nor during initial boot. However, it is not a fair comparison as we have fewer Ramons and fewer ports per Ramon.

Tagging @jfeng-arista for awareness

@saksarav-nokia
Copy link
Contributor

fabric_reach.txt
@kenneth-arista , Please find the attached output of show fabric reachability from our cpm card which has 14 Ramons and 192 fabric ports in each.

@abdosi
Copy link
Contributor

abdosi commented Jun 28, 2023

@kenneth-arista is working on PR to create to remove extra loop on updateFabricPortState

@abdosi
Copy link
Contributor

abdosi commented Jun 28, 2023

also should we check on enhancing [enable_counter.py](https://github.com/sonic-net/sonic-buildimage/blob/master/dockers/docker-orchagent/enable_counters.py) to delay Fabric Port polling start

kenneth-arista added a commit to kenneth-arista/sonic-swss that referenced this issue Jul 7, 2023
Call to updateFabricPortState in FabricPortsOrch::getFabricPortList() is
redundant as FabricPortsOrch::doTask() already calls it.

This change helps mitigate the MHz spikes during boot up of the supe as
described in sonic-net/sonic-buildimage#15321.
@kenneth-arista
Copy link
Contributor

@saksarav-nokia looking at your fabric_reach.txt output, not all 192 links are connected. On ASICs with connections, there are only 120 active links. Although Ramon supports up to 192 links, not all of them are used. There must something else amiss in your setup that is causing these timeouts. We also use Ramon, but have 144 active links and haven't yet seen these timeouts.

To help mitigate orchagent restarts, I posted sonic-net/sonic-swss#2850 to remove the redundant code. However, let's gather more info on what stats are being polled and how long the operations take before changing the polling interval.

@saksarav-nokia
Copy link
Contributor

@kenneth-arista , We have 16 Ramons with 192 SFM links in each Ramon. Since we have only 5 (out of 8) IMM cards inserted in this chassis, only 120 SFM links are up. But i see SONiC fabric polling code polls the status for all 192 links even if only 120 links are up.

@kenneth-arista
Copy link
Contributor

@saksarav-nokia can you propose a PR for changing the polling code because it's not productive for me to do it if I can't test it nor reproduce the problem.

kenneth-arista added a commit to kenneth-arista/sonic-swss that referenced this issue Jul 12, 2023
Call to updateFabricPortState in FabricPortsOrch::getFabricPortList() is
redundant as FabricPortsOrch::doTask() already calls it.

This change helps mitigate the MHz spikes during boot up of the supe as
described in sonic-net/sonic-buildimage#15321.
judyjoseph pushed a commit to sonic-net/sonic-swss that referenced this issue Jul 13, 2023
Call to updateFabricPortState in FabricPortsOrch::getFabricPortList() is
redundant as FabricPortsOrch::doTask() already calls it.

This change helps mitigate the MHz spikes during boot up of the supe as
described in sonic-net/sonic-buildimage#15321.
@judyjoseph
Copy link
Contributor Author

Another interesting observation ( I have taken port:0x1000000000122 here in the below example ).

The SAI calls happens vey close like twice in subsequent seconds resulting in READ taking longer 1337 ms. Will need to check if there is some overlaps, or is it because the last polling of fabric ports did not complete and we have started the next loop etc

admin@svcstr-7250-sup-1:/var/log$ sudo zgrep 0x1000000000122 syslog | grep syncd7
Jul 13 14:29:15.896266 svcstr--sup-1 NOTICE syncd7#syncd: :- threadFunction: time span 272 ms for 'get:SAI_OBJECT_TYPE_PORT:oid:0x1000000000122'
Jul 13 14:33:18.096046 svcstr--sup-1 NOTICE syncd7#syncd: :- threadFunction: time span 168 ms for 'get:SAI_OBJECT_TYPE_PORT:oid:0x1000000000122'
Jul 13 14:39:19.286386 svcstr--sup-1 NOTICE syncd7#syncd: :- threadFunction: time span 121 ms for 'get:SAI_OBJECT_TYPE_PORT:oid:0x1000000000122'
Jul 13 14:39:45.309216 svcstr--sup-1 NOTICE syncd7#syncd: :- threadFunction: time span 149 ms for 'get:SAI_OBJECT_TYPE_PORT:oid:0x1000000000122'
Jul 13 14:45:20.538341 svcstr--sup-1 NOTICE syncd7#syncd: :- threadFunction: time span 272 ms for 'get:SAI_OBJECT_TYPE_PORT:oid:0x1000000000122'
Jul 13 14:45:47.559030 svcstr--sup-1 NOTICE syncd7#syncd: :- threadFunction: time span 337 ms for 'get:SAI_OBJECT_TYPE_PORT:oid:0x1000000000122'
Jul 13 14:45:48.559118 svcstr--sup-1 NOTICE syncd7#syncd: :- threadFunction: time span 1337 ms for 'get:SAI_OBJECT_TYPE_PORT:oid:0x1000000000122'
Jul 13 14:55:45.981219 svcstr--sup-1 NOTICE syncd7#syncd: :- threadFunction: time span 7 ms for 'get:SAI_OBJECT_TYPE_PORT:oid:0x1000000000122'
Jul 13 15:07:45.440779 svcstr--sup-1 NOTICE syncd7#syncd: :- threadFunction: time span 87 ms for 'get:SAI_OBJECT_TYPE_PORT:oid:0x1000000000122'
Jul 13 15:11:15.558521 svcstr--sup-1 NOTICE syncd7#syncd: :- threadFunction: time span 18 ms for 'get:SAI_OBJECT_TYPE_PORT:oid:0x1000000000122'

yxieca pushed a commit to sonic-net/sonic-swss that referenced this issue Jul 13, 2023
Call to updateFabricPortState in FabricPortsOrch::getFabricPortList() is
redundant as FabricPortsOrch::doTask() already calls it.

This change helps mitigate the MHz spikes during boot up of the supe as
described in sonic-net/sonic-buildimage#15321.
StormLiangMS pushed a commit to sonic-net/sonic-swss that referenced this issue Jul 19, 2023
Call to updateFabricPortState in FabricPortsOrch::getFabricPortList() is
redundant as FabricPortsOrch::doTask() already calls it.

This change helps mitigate the MHz spikes during boot up of the supe as
described in sonic-net/sonic-buildimage#15321.
theasianpianist pushed a commit to theasianpianist/sonic-swss that referenced this issue Jul 20, 2023
Call to updateFabricPortState in FabricPortsOrch::getFabricPortList() is
redundant as FabricPortsOrch::doTask() already calls it.

This change helps mitigate the MHz spikes during boot up of the supe as
described in sonic-net/sonic-buildimage#15321.
@judyjoseph
Copy link
Contributor Author

Closing this issue as we don't see the orchagent exits with this PR #2850. Still fine tuning of counters are still needed for fabric ports -- to open a new issue,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Chassis 🤖 Modular chassis support chassis-voq Voq chassis changes Issue for 202205 P0 Priority of the issue Triaged this issue has been triaged
Projects
Status: Done
Development

No branches or pull requests

6 participants