Skip to content

Switch Port Configuration

Shalom Toledo edited this page Dec 20, 2018 · 20 revisions
Table of Contents
  1. Port Identification
  2. Physical Port Identification
    1. Using udev Rules
    2. Using ethtool
    3. Using systemd
  3. Port Administrative State
  4. Port MTU
  5. Port Speed
  6. Port Statistics
    1. Software Statistics
    2. Hardware Statistics
      1. Notes
    3. Resetting Statistics
  7. Port Splitting
    1. Splitting
    2. Unsplitting
  8. Port Module Information
  9. Further Resources

Port Identification

Management ports do not use the same driver as front panel ports and can therefore be distinguished using the Linux ethtool utility.

The following is an output example of a management port on a Mellanox SN2700 switch:

$ ethtool -i eth1
driver: e1000e
version: 3.2.6-k
firmware-version: 1.10-0
bus-info: 0000:06:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

The following is an output example of a front panel port on a Mellanox SN2700 switch:

$ ethtool -i sw1p1
driver: mlxsw_spectrum
version: 1.0
firmware-version: 13.400.116
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

The output example above shows that the management port uses Intel's e1000e driver whereas the front panel port uses Mellanox's mlxsw_spectrum driver.

Physical Port Identification

Using udev rules

As of Linux 4.7 it has become possible to create udev rules which rename the software interfaces (port netdevs) corresponding to the front panel ports according to the front panel numbering. To do so, create the following rule in /etc/udev/rules.d/10-local.rules:

SUBSYSTEM=="net", ACTION=="add", DRIVERS=="mlxsw_spectrum*", \
    NAME="sw$attr{phys_port_name}"

Using ethtool

It is possible to make the port LED blink using ethtool and thereby identify the corresponding physical interface:

$ ethtool -p sw1p1

This command turns on the LED next to the port until it is explicitly turned off by killing ethtool. It is possible to turn the LED on for a specific number of seconds by running:

$ ethtool -p sw1p1 5

Using systemd

systemd 234 can automatically rename the ports according to their front panel numbering without user intervention. This results in names such as enp3s0np5, which represents front panel port 5.

Note: This functionality was backported to systemd 231 in Fedora and thus available in Fedora 25 and onwards.

Port Administrative State

After booting the switch or loading the driver, all the ports go down. The following command changes the administrative state of the port to up:

$ ip link set dev sw1p5 up

However, the operational state of the port only changes to up if the port is able to negotiate the link with its partner. In which case, the output appears as follows:

$ ip link show dev sw1p5
31: sw1p5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast switchid e41d2d45a9c0 state UP mode DEFAULT group default qlen 1000
    link/ether e4:1d:2d:45:a9:f5 brd ff:ff:ff:ff:ff:ff

To set the port to down, run:

$ ip link set dev sw1p5 down

Port MTU

To set the port MTU, run:

$ ip link set dev sw1p1 mtu 1400

The switch supports jumbo frames, so values higher than 1500 may be used.

Port Speed

Port speed settings are performed with the ethtool utility. Assuming the port's operational status is up, the user may query its current speed:

$ ethtool sw1p5 | grep Speed
        Speed: 40000Mb/s

In this case the port's speed is 40Gb/s. To set a different speed, run:

$ ethtool -s sw1p5 speed 10000 autoneg off

This sets the port's speed to 10Gb/s. Assuming the administrative state of the port is up, this command makes the port go through link negotiation again by toggling its administrative state to down and then up. However, the port only goes up if its partner also supports the configured speed.

The command also disables speed auto-negotiation by setting only one desired speed. To allow the switch to auto-negotiate and choose the highest advertised speed, the user may enable auto-negotiation by running:

$ ethtool -s sw1p5 autoneg on

To query the port speed after speed negotiation, run:

$ ethtool sw1p5 | grep Speed
        Speed: 40000Mb/s

Port Statistics

Two types of statistics exist for each port:

  • Software
  • Hardware

Software statistics account for packets trapped to the CPU or packets sent from the CPU. Hardware statistics account for all packets going through the port, including those not trapped to or originating from the CPU.

Software Statistics

The ifstat utility is used to query the port's software statistics:

$ ifstat -x cpu sw1p5
#kernel
Interface        RX Pkts/Rate    TX Pkts/Rate    RX Data/Rate    TX Data/Rate
                 RX Errs/Drop    TX Errs/Drop    RX Over/Rate    TX Coll/Rate
sw1p5                  0 0             0 0             0 0             0 0
                       0 0             0 0             0 0             0 0

Hardware Statistics

Two utilities can be used to query the port's hardware statistics:

  • ip utility
  • ethtool utility

Using ip:

$ ip -s link show sw1p5
31: sw1p5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast switchid e41d2d45a9c0 state UP mode DEFAULT group default qlen 1000
    link/ether e4:1d:2d:45:a9:f5 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    136360     1868     0       1864    0       0
    TX: bytes  packets  errors  dropped carrier collsns
    776        8        0       0       0       0

Using ethtool:

$ ethtool -S sw1p5
NIC statistics:
     a_frames_transmitted_ok: 8500
     a_frames_received_ok: 772
     a_frame_check_sequence_errors: 0
     a_alignment_errors: 0
     a_octets_transmitted_ok: 874212
     a_octets_received_ok: 67968
     a_multicast_frames_xmitted_ok: 308
     a_broadcast_frames_xmitted_ok: 0
     a_multicast_frames_received_ok: 290
     a_broadcast_frames_received_ok: 0
     a_in_range_length_errors: 0
     a_out_of_range_length_field: 0
     a_frame_too_long_errors: 0
     a_symbol_error_during_carrier: 0
     a_mac_control_frames_transmitted: 0
     a_mac_control_frames_received: 0
     a_unsupported_opcodes_received: 0
     a_pause_mac_ctrl_frames_received: 0
     a_pause_mac_ctrl_frames_xmitted: 0
     if_in_discards: 0
     if_out_discards: 0
     if_out_errors: 0
     ether_stats_undersize_pkts: 0
     ether_stats_oversize_pkts: 0
     ether_stats_fragments: 0
     ether_pkts64octets: 0
     ...
     ether_pkts65to127octets: 0
     ...
     dot3stats_fcs_errors: 0
     dot3stats_symbol_errors: 0
     dot3control_in_unknown_opcodes: 0
     dot3in_pause_frames: 0
     discard_ingress_general: 0
     discard_ingress_policy_engine: 0
     discard_ingress_vlan_membership: 0
     discard_ingress_tag_frame_type: 0
     discard_egress_vlan_membership: 0
     discard_loopback_filter: 0
     discard_egress_general: 0
     discard_egress_hoq: 0
     discard_egress_policy_engine: 0
     discard_ingress_tx_link_down: 0
     discard_egress_stp_filter: 0
     discard_egress_sll: 0
     rx_octets_prio_0: 67968
     rx_frames_prio_0: 772
     tx_octets_prio_0: 874212
     tx_frames_prio_0: 8500
     rx_pause_prio_0: 0
     rx_pause_duration_prio_0: 0
     tx_pause_prio_0: 0
     tx_pause_duration_prio_0: 0
     ...
     tc_transmit_queue_tc_0: 0
     tc_no_buffer_discard_uc_tc_0: 0
     ...
Notes
  1. a_frames_transmitted_ok: Includes PAUSE frames transmitted by the port. This applies for a_octets_transmitted_ok as well.

  2. a_frames_received_ok: Includes packets later discarded due to insufficient space in the port's headroom or not admitted to the switch's shared buffer. This applies for a_octets_received_ok as well.

  3. a_pause_mac_ctrl_frames_received: Includes both PAUSE and PFC frames. This applies for a_pause_mac_ctrl_frames_xmitted as well.

  4. As part of RFC 2863:

    if_in_discards - The number of inbound packets which were chosen to be discarded even though no errors had been detected to prevent them from being deliverable to a higher-layer protocol.

    if_out_discards - The number of outbound packets which were chosen to be discarded even though no errors had been detected to prevent them from being transmitted.

    if_out_errors - The number of outbound packets that could not be transmitted because of errors.

  5. As part of RFC 2819:

    ether_stats_undersize_pkts - The total number of packets received that were less than 64 octets long (excluding framing bits, but including FCS octets) and were otherwise well formed.

    ether_stats_oversize_pkts - The total number of packets received that were longer than MTU octets (excluding framing bits, but including FCS octets) but were otherwise well formed.

    ether_stats_fragments - The total number of packets received that were less than 64 octets in length (excluding framing bits but including FCS octets) and had either a bad FCS with an integral number of octets (FCS error) or a bad FCS with a non-integral number of octets (alignment error).

    ether_pkts64octets - The total number of packets (including bad packets) received that were 64 octets in length (excluding framing bits but including FCS octets).

    ether_pkts<X>to<Y>octets - The total number of packets (including bad packets) received that were between X and Y octets in length (excluding framing bits but including FCS octets).

  6. As part of RFC 3635:

    dot3stats_fcs_errors - A count of frames received that are an integral number of octets in length but do not pass the FCS check. This count does not include frames received with frame-too-long or frame-too-short errors.

    dot3stats_symbol_errors - The number of times the receiving media is non-idle (a carrier event) for a period of time equal to or greater than minFrameSize, and during which there was at least one occurrence of an event that causes the PHY to indicate 'Receive Error'.

    dot3control_in_unknown_opcodes - A count of MAC Control frames received that contain an opcode that is not supported.

    dot3in_pause_frames - count of MAC Control frames received with an opcode indicating the PAUSE operation.

  7. Hardware specific discard counters:

    discard_egress_general - In Spectrum, counts only MTU discards.

    discard_egress_hoq - Head-of-Queue time-out discards.

    discard_egress_sll - Number of packets dropped, because the Switch Lifetime Limit was exceeded.

  8. rx_pause_prio_X: Number of PFC frames received from the far-end port with priority X. PAUSE frames increment the counters of all priorities.

  9. rx_pause_duration_prio_X: The total time in microseconds in which transmission of packets with priority X to the far-end port has been paused. PAUSE frames increment the counters of all priorities.

  10. tx_pause_prio_X: Number of PFC frames sent to the far-end port with priority X. PAUSE frames increment the counters of all priorities.

  11. tx_pause_duration_prio_X: The total time in microseconds that transmission of packets with priority X from the far-end port has been requested to pause.

  12. tc_transmit_queue_tc_X: The transmit queue depth in bytes of traffic class X.

  13. tc_no_buffer_discard_uc_tc_X: The number of unicast packets with traffic class X dropped due to lack of shared buffer resources.

Resetting Statistics

The port's statistics are never reset while the driver is loaded. They can only be reset by removing and inserting the driver.

However, it is possible to see the difference in the hardware statistics using iproute2's ifstat utility. When executed, it shows the difference between the last and the current call:

$ ifstat sw1p5
#kernel
Interface        RX Pkts/Rate    TX Pkts/Rate    RX Data/Rate    TX Data/Rate
                 RX Errs/Drop    TX Errs/Drop    RX Over/Rate    TX Coll/Rate
sw1p5                  1 0             1 0            98 0           114 0
                       0 0             0 0             0 0             0 0

(... after some time passes ...)

$ ifstat sw1p5
#kernel
Interface        RX Pkts/Rate    TX Pkts/Rate    RX Data/Rate    TX Data/Rate
                 RX Errs/Drop    TX Errs/Drop    RX Over/Rate    TX Coll/Rate
sw1p5                  9 0             9 0           882 0          1026 0
                       0 0             0 0             0 0             0 0

Port Splitting

As of Linux 4.6 it has become possible to split and unsplit the front panel ports using the devlink utility, which is part of the iproute2 package. Note that devlink is available in iproute2 starting with version 4.6.0.

Splitting

The following command splits the first front panel port into 4 ports:

$ devlink port split pci/0000:03:00.0/61 count 4

Where pci/0000:03:00.0/61 is the DEV/PORT_INDEX handle used by devlink and can be retrieved using the command devlink port show:

$ devlink port show
...
pci/0000:03:00.0/61: type eth netdev sw1p1
...

Assuming the previously described udev rule is used, sw1p1 disappears and the following net devices are created:

$ devlink port show
...
pci/0000:03:00.0/61: type eth netdev sw1p1s0 split_group 0
pci/0000:03:00.0/62: type eth netdev sw1p1s1 split_group 0
pci/0000:03:00.0/63: type eth netdev sw1p1s2 split_group 0
pci/0000:03:00.0/64: type eth netdev sw1p1s3 split_group 0
...

Note: In SN2700 and SN2410, splitting a port by four disables the adjacent port in the front panel column. So in the case above, both sw1p1 and sw1p2 disappear.

Unsplitting

The following command unsplits the previously split sw1p1 port:

$ devlink port unsplit pci/0000:03:00.0/62

The handle DEV/PORT_INDEX of any of the split ports can be used when unsplitting. The unsplit command re-spawns the previously present front panel ports: sw1p1 and sw1p2.

Port Module Information

In order to access the SFP+/QSFP internal EEPROM info, use the ethtool -m command. For example:

$ ethtool -m sw1p7
Identifier                                : 0x0d (QSFP+)
Extended identifier                       : 0x00
Extended identifier description           : 1.5W max. Power consumption
Extended identifier description           : No CDR in TX, No CDR in RX
Extended identifier description           : High Power Class (> 3.5 W) not enabled
Connector                                 : 0x23 (No separable connector)
Transceiver codes                         : 0x88 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Transceiver type                          : 40G Ethernet: 40G Base-CR4
Transceiver type                          : 100G Ethernet: 100G Base-CR4 or 25G Base-CR CA-L
Encoding                                  : 0x00 (unspecified)
BR, Nominal                               : 25500Mbps
Rate identifier                           : 0x00
Length (SMF,km)                           : 0km
Length (OM3 50um)                         : 0m
Length (OM2 50um)                         : 0m
Length (OM1 62.5um)                       : 0m
Length (Copper or Active cable)           : 1m
Transmitter technology                    : 0xa0 (Copper cable unequalized)
Attenuation at 2.5GHz                     : 2db
Attenuation at 5.0GHz                     : 3db
Attenuation at 7.0GHz                     : 4db
Attenuation at 12.9GHz                    : 7db
Vendor name                               : Mellanox
Vendor OUI                                : 00:02:c9
Vendor PN                                 : MCP1600-E00A
Vendor rev                                : A2
Vendor SN                                 : MT1526VS05742
Revision Compliance                       : SFF-8636 Rev 1.5
Module temperature                        : 0.00 degrees C / 32.00 degrees F
Module voltage                            : 0.0000 V

Further Resources

  1. SN2700/SN2400 Hardware User Manual (PDF)
  2. man ethtool
  3. Writing udev rules
  4. man ip
  5. man devlink
  6. man devlink-dev
  7. man devlink-port
  8. man ifstat
Clone this wiki locally