-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
control/server.py: Enable DSA to offload and accelerate CRC calculation #704
Conversation
Hi, this is a PR that enables DSA to offload and accelerate CRC calculation. I have previously tested performance data using DSA acceleration in an SPDK+ceph environment. My manager and colleagues should have discussions with IBM colleagues about using DSA acceleration and show the performance data. In the container environment, I will test the performance of the gateway in the future. Since this feature has been implemented in SPDK, I just called the interface in the gateway. But how to map DSA devices into the container, I wrote it in README. If the users want to enable DSA, they need to manually change |
Hi @caroav , this one. |
@caroav I don't really know that code in spdk. So, I can't say whether or not it's OK. |
@caroav I like the idea of offloading where we can, but I'd prefer to see performance comparison results before a merge is considered. We'd also need to consider the impact this has in the orchestration layer in ceph since any potential change to container CAPS or bindings would need corresponding changes planned there too. |
@CongMinYin the default is that this feature is Disabled right? If that's true, then @pcuzner why do we need to be concerned? @CongMinYin do you have any comments on Paul's concerns? |
Yes, this feature is disabled by default. I know that changing container's |
Sorry for the late reply. I was asking my manager earlier if it's possible to display performance data on GitHub. Firstly, to answer the first question, DSA is a kind of DMA engine. It is an accelerator integrated into the CPU. Currently only supported on the 4th Gen. It is not an instruction set.
|
@@ -50,7 +50,7 @@ SPDK_URL="https://spdk.io" | |||
|
|||
SPDK_PKGDEP_ARGS="--rbd" | |||
# check spdk/configure --help | |||
SPDK_CONFIGURE_ARGS="--with-rbd --disable-tests --disable-unit-tests --disable-examples --enable-debug" | |||
SPDK_CONFIGURE_ARGS="--with-rbd --with-idxd --disable-tests --disable-unit-tests --disable-examples --enable-debug" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work on older CPU, please detect it and only is supported enable it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we build right now, it might not work on any CPU. To eliminate crashes on some machine we build the SPDK using x86-64-v2 architecture. This instructs the compiler not to use instructions which are not available on v2. We need to verify that the instructions used here are supported on v2 CPUs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CongMinYin @gbregman the question is what happens if one one hand we build to x86-64-v2, but on the other hand we specify this flag (--with-idxd). I assume that the SPDK build will ignore the --with-idxd because it is not supported on x86-64-v2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@caroav this can easily verified. We can just disassemble the SPDK binary and see if this instruction was used or not. This is what I did when we had the issue with RDSEED.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@caroav I think we shuold check enable_dsa
in ceph-nvmeof.conf
, then configure with --with-idxd
flag. But I don't how to write something like "if else" in .env
or Dockerfile
. It seems like you're saying the x86-64-v2 instruction set. But in reality, DSA is a DMA engine, an accelerator only integrated into the new 4th Gen CPU now.
idxd
is a kernel driver. I have tried to configure idxd
on server with AMD CPU, it will report an error and exit. So it's best to check the enable_dsa
option in the ceph-nvmeof.conf
. Provide different parameters in .env
or Dockerfile
. But I couldn't find a similar code block. Do you know what to do?
Run configure in a server with AMD CPU:
root@amd-server4:~/spdk# ./configure --with-idxd
ERROR: IDXD cannot be used due to CPU incompatibility.
SPDK configure code to detect CPU:
https://github.com/spdk/spdk/blob/dfb2950fd5b13ba0a2a2b533e3e534202a89de74/configure#L741
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the IDXD code itself is safe for all hardware platforms:
- It is scanning for a specific PCIe device (PCI_DEVICE_ID_INTEL_IDXD) and will not initialize if this device is not found. Older Intel CPUs and AMD CPUs will not have this device.
- It needs to use the movdir64b instruction to pass work to the hardware, this is only supported on new Intel CPUs. However this instruction is only ever executed if the PCIe device is found. Due to lack of compiler support this instruction is being generated using a sequence of bytes in assembly code:
static inline void movdir64b(void *dst, const void *src) { asm volatile(".byte 0x66, 0x0f, 0x38, 0xf8, 0x02" : "=m"(*(char *)dst) : "d"(src), "a"(dst)); }
So building the code with --with-idxd will include a new assembly instruction, but it will only be executed on CPUs that support it because of the PCIe device check. Its very hard to prove this other than by code review.
The configure script for SPDK is only permitting the code to be compiled with --with-idxd if it is compiled on an Intel CPU. This means you can't use an AMD server as the build machine. The script is not checking the CPU version - so you can build the code with support for IDXD on an old Intel CPU.
I suggest that the check in the configure script is overly restrictive, and perhaps should be reduce to a warning rather than an error as the code may be built on a system with different capabilities to the system it is run on. It might also be better to check for the presence of the PCIe device rather than the CPU vendor to determine whether the build server supports idxd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, we need support for the code to be compiled for non-Intel CPU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@caroav Is our solution to try submitting a PR to modify the SPDK's config script, or check the enable_dsa
parameter in this ceph-nvmeof
project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @caroav @oritwas, I have submit a pr to solve the platforms errors: https://review.spdk.io/gerrit/c/spdk/spdk/+/24172. It merged. SPDK master branch and later versions won't met this platform errors.
@bill-scales FYI |
yes we will take care soon. |
@CongMinYin thankl you for your help!
def is_privileged_and_idxd_loaded():
# Check if the container is running in privileged mode
privileged = False
try:
with open('/proc/1/status', 'r') as f:
for line in f:
if line.startswith('CapEff'):
# CapEff shows the effective capabilities of the process
# If CapEff is all Fs, it indicates privileged mode
if 'ffffffffffffffff' in line.split()[1]:
privileged = True
break
except FileNotFoundError:
pass # This might indicate the code is not running in a container
# Check if the idxd module is loaded
idxd_loaded = False
try:
with open('/proc/modules', 'r') as f:
for line in f:
if line.startswith('idxd'):
idxd_loaded = True
break
except FileNotFoundError:
pass # This could indicate an issue with the file system or running environment
return privileged and idxd_loaded wdyt? |
Hi @baum , thank you for your feedback. I have replied below. I will delete the section
Sorry, I don't understand what operation I need to do. How to update this branch or patch to the current submodule?
Regarding the privilege mode, it may have been a misunderstanding caused by README I worte. The relationship between DSA 3 drivers:
|
ping @baum |
Sorry @CongMinYin missed your message. In short:
More details:
cd /path/to/ceph-nvmeof
cd spdk
git fetch origin
git checkout gerrit-24172
cd ..
git push your_fork wip-dsa
git submodule status This should show the updated commit hash corresponding to the gerrit-24172 branch in the spdk submodule. |
676683e
to
d94eaa9
Compare
@baum done. But CI failed in building nvmeof, ". |
@CongMinYin we will fix this issue probably in few hours. |
@CongMinYin could you please rebase, using latest .env ceph image? 🙏 |
@baum Rebased. 1 failed in |
Signed-off-by: Yin Congmin <[email protected]>
Signed-off-by: Yin Congmin <[email protected]>
ping @baum already rebase to the latest code. |
Created an integration branch for CongMinYin:wip-dsa + gerrit 24172 - see #857. The branch includes the two required changes, based on the current ceph-nvmeof and ceph/spdk.
Let me know if it looks OK to you. In case everything is as expected, please approve, so I could merge it. |
|
@baum #857 It looks good to me. Hi @bill-scales , sorry, I haven't actually tried compiling idxd on an ARM platform without an ARM machine. As you said IDXD is code safe itself, and instruction is being generated using a sequence of bytes in assembly code. Maybe you konw some about this error about
|
Intel® DSA can generate and test CRC checksum. This feature has already been implemented in SPDK. Enabling this feature allows for offloading and accelerating CRC calculations in NVMe-oF.