Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: bgp/test_startup_tsa_tsb_service.py::test_user_init_tsb_on_sup_while_service_run_on_dut failing. #14834

Open
vperumal opened this issue Oct 3, 2024 · 5 comments
Assignees

Comments

@vperumal
Copy link
Contributor

vperumal commented Oct 3, 2024

Issue Description

bgp/test_startup_tsa_tsb_service.py::test_user_init_tsb_on_sup_while_service_run_on_dut is failing due to state not synced to Supervisor card.

The test reboots the supervisor and initiates a TSB on the Supervisor to move all LCs from Maintenance mode to Normal mode. But since the service doesn't run on Supervisor, the state is on RP is in normal state, which causes this failure.

cisco@sfd-lt2-sup:~$ sudo systemctl status startup_tsa_tsb.service
○ startup_tsa_tsb.service - STARTUP TSA-TSB SERVICE
Loaded: loaded (/lib/systemd/system/startup_tsa_tsb.service; enabled-runtime; preset: enabled)
Active: inactive (dead)
Condition: start condition failed at Fri 2024-09-27 07:41:34 UTC; 7h ago

Sep 27 07:35:10 sfd-lt2-sup systemd[1]: startup_tsa_tsb.service - STARTUP TSA-TSB SERVICE was skipped because of an unmet condition check (ConditionPathExists=!/etc/sonic/chas>
Sep 27 07:41:34 sfd-lt2-sup systemd[1]: startup_tsa_tsb.service - STARTUP TSA-TSB SERVICE was skipped because of an unmet condition check (ConditionPathExists=!/etc/sonic/chas>

The test fails because it is expecting a password when TSB is issued but since according to chassis it is in Normal state, the testcase fails. TSC itself is able to give proper state of the LC.

def exec_tsa_tsb_cmd_on_supervisor(duthosts, enum_supervisor_dut_hostname, creds, tsa_tsb_cmd):
    """
    @summary: Issue TSA/TSB command on supervisor card using user credentials
    Verify command is executed on supervisor card
    @returns: None
    """
    try:
        suphost = duthosts[enum_supervisor_dut_hostname]
        sup_ip = suphost.mgmt_ip
        sonic_username = creds['sonicadmin_user']
        sonic_password = creds['sonicadmin_password']
        logger.info('sonic-username: {}, sonic_password: {}'.format(sonic_username, sonic_password))
        ssh_cmd = "ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no {}@{}".format(sonic_username, sup_ip)
        connect = pexpect.spawn(ssh_cmd)
        connect.expect('.*[Pp]assword:')
        connect.sendline(sonic_password)
        i = connect.expect('{}@{}:'.format(sonic_username, suphost.hostname), timeout=60)
        pytest_assert(i == 0, "Failed to connect")
        connect.sendline(tsa_tsb_cmd)
      connect.expect('.*[Pp]assword for username \'{}\':'.format(sonic_username))

connect = <pexpect.pty_spawn.spawn object at 0x7f5c2c568280>
creds = {'ansible_altpasswords': [], 'ansible_become_pass': 'roZes@123', 'ansible_ssh_pass': 'roZes@123', 'ansible_ssh_user': 'admin', ...}
duthosts = [, , , ]
enum_supervisor_dut_hostname = 'sfd-lt2-sup'
i = 0
sonic_password = 'cisco123'
sonic_username = 'cisco'
ssh_cmd = 'ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no [email protected]'
sup_ip = '1.75.44.200'
suphost =
tsa_tsb_cmd = 'sudo TSB'

cisco@sfd-lt2-sup:~$ sudo TSB
Chassis is already in Normal mode
Please execute 'sudo config save' to preserve System mode in Normal state after reboot or config reload on all linecards

cisco@sfd-lt2-sup:~$ sudo TSC
Password for username 'cisco':
======== LINE-CARD0|None output: ========
BGP0 : System Mode: Maintenance
BGP1 : System Mode: Maintenance
BGP2 : System Mode: Maintenance
TSB : Pending (Time Remaining:657 seconds, service:startup_tsa_tsb.service)
The rates are calculated within 5 seconds period
IFACE STATE RX_OK RX_BPS RX_UTIL RX_ERR RX_DRP RX_OVR TX_OK TX_BPS TX_UTIL TX_ERR TX_DRP TX_OVR


Ethernet0 U 0 7.65 B/s 0.00% 0 0 0 7 2383.27 KB/s 0.00% 0 0 0
Ethernet8 U 7 146.93 B/s 0.00% 0 0 0 0 2442.55 KB/s 0.00% 0 0 0

Results you see

Failure due to password prompt not seen

def exec_tsa_tsb_cmd_on_supervisor(duthosts, enum_supervisor_dut_hostname, creds, tsa_tsb_cmd):
    """
    @summary: Issue TSA/TSB command on supervisor card using user credentials
    Verify command is executed on supervisor card
    @returns: None
    """
    try:
        suphost = duthosts[enum_supervisor_dut_hostname]
        sup_ip = suphost.mgmt_ip
        sonic_username = creds['sonicadmin_user']
        sonic_password = creds['sonicadmin_password']
        logger.info('sonic-username: {}, sonic_password: {}'.format(sonic_username, sonic_password))
        ssh_cmd = "ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no {}@{}".format(sonic_username, sup_ip)
        connect = pexpect.spawn(ssh_cmd)
        connect.expect('.*[Pp]assword:')
        connect.sendline(sonic_password)
        i = connect.expect('{}@{}:'.format(sonic_username, suphost.hostname), timeout=60)
        pytest_assert(i == 0, "Failed to connect")
        connect.sendline(tsa_tsb_cmd)
      connect.expect('.*[Pp]assword for username \'{}\':'.format(sonic_username))

connect = <pexpect.pty_spawn.spawn object at 0x7f5c2c568280>
creds = {'ansible_altpasswords': [], 'ansible_become_pass': 'roZes@123', 'ansible_ssh_pass': 'roZes@123', 'ansible_ssh_user': 'admin', ...}
duthosts = [, , , ]
enum_supervisor_dut_hostname = 'sfd-lt2-sup'
i = 0
sonic_password = 'cisco123'
sonic_username = 'cisco'
ssh_cmd = 'ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no [email protected]'
sup_ip = '1.75.44.200'
suphost =
tsa_tsb_cmd = 'sudo TSB'

Results you expected to see

The behavior is unclear, Question to the community is - Do we expect the tsa tsb service to be running on the chassis ? What is behavior for other vendors ?

Is it platform specific

generic

Relevant log output

No response

Output of show version

No response

Attach files (if any)

No response

@vperumal
Copy link
Contributor Author

vperumal commented Oct 3, 2024

FYI @abdosi @anamehra @yejianquan

@Javier-Tan
Copy link
Contributor

Hi @vperumal,

I have a related issue #14843 open to fix the test failure due to password prompt when running TSA/TSB on sup cards, however I'm interested on the intended behaviour of test_user_init_tsb_on_sup_while_service_run_on_dut as well.

I will keep the PR as draft until we are clear on this.

@arlakshm for vis

@Javier-Tan Javier-Tan self-assigned this Oct 5, 2024
@Javier-Tan
Copy link
Contributor

Javier-Tan commented Oct 5, 2024

The answer can be found in https://github.com/sonic-net/SONiC/blob/master/doc/voq/Reliable_TSA.md

If Supervisor tsa_enabled == FALSE,

  • Operational TSA state is controlled by LC tsa_enabled config (including the startup_tsa_tsb service if the LC reboots)

In the test, supervisor starts in tsa_enabled == False -> Performing TSB effectively does nothing should not affect LCs when running startup_tsa_tsb service. Tests should be updated to reflect this

@vperumal
Copy link
Contributor Author

vperumal commented Oct 5, 2024

Thanks @Javier-Tan - I will add the check to see if tsa is enabled on supervisor

@Javier-Tan Javier-Tan assigned vperumal and unassigned Javier-Tan Oct 5, 2024
@Javier-Tan
Copy link
Contributor

Thanks, if you could fix sup in TSB mode behaviour, I will cover the test gap for sup in TSA mode behaviour #14850

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants