Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to issue a PCIe FLR to CL #652

Open
ns-intusurg opened this issue Sep 23, 2024 · 11 comments
Open

How to issue a PCIe FLR to CL #652

ns-intusurg opened this issue Sep 23, 2024 · 11 comments

Comments

@ns-intusurg
Copy link

Hi,

What is the runtime procedure to issue a PCIe Function Level Reset (sh_cl_flr_assert) to the CL?

I found the HDK's tb.issue_flr() command used for simulation, but I couldn't find any SDK runtime equivalent C function in the repo or in the documentation.

Thanks

@AWSjoeluc AWSjoeluc reopened this Sep 24, 2024
@AWSjoeluc
Copy link

Hello! Thanks for reaching out with your question. I assume you've found mention of FLR in the documentation here: https://github.com/HFTrader/aws-fpga/blob/master/hdk/docs/AWS_Shell_Interface_Specification.md#function-level-reset-flr

Linux platforms exposes access to the FLR with /sys/bus/pci/devices/$BDF/reset where $BDF is the bus device function number of the targeted function. To trigger an FLR, you can try the following commands:

echo 1 > /sys/bus/pci/devices/$BDF/reset

    OR

echo 1 | sudo tee -a /sys/bus/pci/devices/$BDF/reset

@ns-intusurg
Copy link
Author

I don't see "reset " listed under the PCI device directory and I'm getting a "No such file or directory" error.

I'm targeting the following device path which is used during the test:
/sys/devices/pci0000:00/0000:00:1d.0

Here are the results of "ls -la" under that path:

..
uevent
.
vendor
subsystem -> ../../../bus/pci
xdma
driver -> ../../../bus/pci/drivers/xdma
subsystem_vendor
subsystem_device
device
resource4_wc
resource4
resource2_wc
resource2
resource1
resource0
revision
resource
rescan
remove
power_state
power
numa_node
msi_irqs
msi_bus
modalias
max_link_width
max_link_speed
local_cpus
local_cpulist
link
irq
firmware_node -> ../../LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:ea
enable
driver_override
dma_mask_bits
d3cold_allowed
current_link_width
current_link_speed
consistent_dma_mask_bits
config
class
broken_parity_status
ari_enabled

@AWSjoeluc
Copy link

That's unexpected, can you share what instance size and AMI you're using? What's the result of lspci -d 1d0f: -vv?

@ns-intusurg
Copy link
Author

I'll have to ask IT about the instance size and AMI, as they set everything up and I don't have access to the amazon admin account.

sudo lspci -d 1d0f: -vv

00:03.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA)
Physical Slot: 3
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0
Region 0: Memory at 85610000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed unknown, Width x0, ASPM not supported
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed unknown (ok), Width x0 (ok)
TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR-
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [b0] MSI-X: Enable+ Count=9 Masked-
Vector table: BAR=0 offset=00002000
PBA: BAR=0 offset=00003000
Kernel driver in use: ena
Kernel modules: ena

00:1c.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller (prog-if 02 [NVM Express])
Physical Slot: 28
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 35
NUMA node: 0
Region 0: Memory at 85614000 (64-bit, non-prefetchable) [size=16K]
Region 2: Memory at 85620000 (64-bit, prefetchable) [size=8K]
Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed unknown, Width x0, ASPM not supported
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed unknown (ok), Width x0 (ok)
TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR-
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [b0] MSI-X: Enable+ Count=32 Masked-
Vector table: BAR=0 offset=00002000
PBA: BAR=0 offset=00003000
Kernel driver in use: nvme
Kernel modules: nvme

00:1d.0 Memory controller: Amazon.com, Inc. Device f000
Subsystem: Device fedd:1d51
Physical Slot: 29
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0
Region 0: Memory at 82000000 (32-bit, non-prefetchable) [size=32M]
Region 1: Memory at 85400000 (32-bit, non-prefetchable) [size=2M]
Region 2: Memory at 85600000 (64-bit, prefetchable) [size=64K]
Region 4: Memory at 2000000000 (64-bit, prefetchable) [size=128G]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [60] MSI-X: Enable+ Count=33 Masked-
Vector table: BAR=2 offset=00008000
PBA: BAR=2 offset=00008fe0
Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 1024 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 25.000W
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM not supported
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s (ok), Width x16 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range BC, TimeoutDis+ NROPrPrP- LTR-
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Kernel driver in use: xdma
Kernel modules: xdma

00:1e.0 Memory controller: Amazon.com, Inc. Device 1041
Subsystem: Xilinx Corporation Device 0007
Physical Slot: 30
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Region 0: Memory at 85618000 (64-bit, prefetchable) [size=16K]
Region 2: Memory at 8561c000 (64-bit, prefetchable) [size=16K]
Region 4: Memory at 85000000 (64-bit, prefetchable) [size=4M]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 1024 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 25.000W
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM not supported
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s (ok), Width x16 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range BC, TimeoutDis+ NROPrPrP- LTR-
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported

@AWSjoeluc
Copy link

Great, thank you. A uname -a would also help in place of the full AMI ID (unless the kernel data contains sensitive information).

@ns-intusurg
Copy link
Author

Waiting for the reply from IT. Here's the command minus the network node hostname:

uname -a
Linux ######## 4.18.0-513.24.1.el8_9.x86_64 #1 SMP Thu Mar 14 14:20:09 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux

@AWSjoeluc
Copy link

I'm currently investigating this behavior internally, I hope to have a response by the end of this week. Thank you for your patience!

@ns-intusurg
Copy link
Author

Just in case you still needed this info about our setup:

instance size - f1.2xlarge
ami - RHEL-8.7.0_HVM-20230330-x86_64-56-Hourly2-GP2

@ns-intusurg
Copy link
Author

Hi @AWSjoeluc , did you ever figure out anything about the missing PCIe FLR for our instance?

I've been reloading the image every test as a workaround, but having an FLR would be way more efficient.

@AWSjoeluc
Copy link

Hello!

I'm glad to hear you have a workaround. FLR's were a feature previously supported with a unique mailbox message. Unfortunately, the feature has been removed from the devkit. I've been working with the team to find a viable replacement for the FLR. I don't have an expected delivery date at this moment, but I will keep you updated!

@AWSjoeluc
Copy link

Hello again, I wanted to let you know that we've recently released our F2 platform where the PCIe IP natively supports FLR: https://github.com/aws/aws-fpga/tree/f2

You can now use the Linux file system /sys/bus/pci/devices/$BDF/reset to trigger an FLR and see it observed in your CL design. I hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants