-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"could not alloc memory for discovery log page" and mount failure with Corsair 8TB drive over RDMA nvmet #807
Comments
Not sure this will make any difference for RDMA. I'm more familiar with TCP. For TCP, the Service ID used for Discovery is typically 8009. And the Service ID for I/O is 4420. You could try |
This looks like the RMDA transport is having an issue. libnvme/nvme-cli doesn't differentiate between the transports. Also the nvme subsystem core doesn't differentiate between the transport. What you could do is to dump the discovery log page for TCP and RDMA ( FWIW, Linux's nvmet implementation (aka soft target) doesn't implement the discovery controller on port 8009. |
For the service ID: I am explicity setting this when I set up the nvmet on the server. I have changed it to 8009, but it doesn't make any difference. Also, I am quite sure it connects to the right service ID, since when the ID differs between client and server, nvme discover / connect complain that they cannot connect. I ran Could you be more specific which log page to obtain with |
The logs show that the initial get header already is corrupted for RDMA. So it's not that the loop is going wrong. TCP:
RDMA:
I just realized we need a block device in order to run the # nvme connect -t rdma -a XXX -s 4420 -n nqn.2014-08.org.nvmexpress.discovery
[ get the nvme decvice from dmesg, it wont show up in nvme list]
# nvme get-log /dev/nvme$X -i 0x70 -l 4096 |
Yes, as I wrote in the original post, the numrec field in the header has a bogus value. Thus it computes a very large size to allocate for the log page, and fails the allocation.
thx, the 0x70 was what I was missing. Will get that tonight connecting to the server both via TCP and via RDMA, and will post the logs here. Thx for the quick support! |
I was just making sure the loop works. We had some bugs in this loop thus I am a bit paranoid. FWIW, in the base spec the output is specified in |
I cannot get a log for id 70h / 112:
While e.g. obtaining the log for id 2 (smart) works, so the nvme get-log is working in general, just not for id 70h. |
Are you sure you connected to the discovery controller. A discovery controller doesn't have name spaces. So it's important to use the correct NQN: |
The header is not necessarily corrupted. We never evaluate the CQE status when calling nvme_get_log_page(); it surely will return an errno when the ioctl fails, but unfortunately an ioctl success just means that we got an CQE correctly, not that the CQE does not have an error status ... |
Check if #810 helps here. |
I am not fully sure what to do :(:
I have done |
'discovery' works without a controller connected.
|
But this is what I did before, see the logs here: #807 (comment)
|
I tried with this PR on top of master, but I am still getting:
I deployed the new version only on the client so far, do I also need to update it on the server? |
Can you try with this patch:
|
This gives
As I wrote in the original post, this numrec entry seems always bogus. |
Again, here the exact commands you need to execute:
Note the device is |
@igaw : OK, thx a lot, now I understand, wasn't clear to me that the discovery controler is an nvme device by itself. Please find the logs here: I set up nvmet for tcp over service 4420 and over rdma over 4421, both exposing 3 subsystems with 1 NVME device each. |
Both transport have the same header, |
@davidrohr could you retry with the latest version from the git trees (libnvme and nvme-cli). @hreinecke updated the status and error handling in the discovery code. There was a bit of confusion in the error handling. see the readme of nvme-cli to make sure you get the libnvme correct. |
@igaw : Thx, I rebuild libnvme and nvme-cli from the upstream repo, and now the discovery over rdma is failing with See the full log here:
|
Alright, this means the error handling catches now the failing commands and we don't blindly continue. This is better but obviously we haven't figured out why commands fail. The first failing command is already the identify controller. I don't really see how TCP can succeed and RDMA fails. I think it's possible to collect the wire conversation between host and target using wireshark/tcpdump. Could you try to record the RDMA attempt? |
Hm, I just tried with wireshark, and I can capture the traffic when connecting over TCP (also connecting with TCP over IPoIB on the Infiniband interface). But when discovering with RDMA, wireshark does not catch anything.
which apparently succeeds does not show anything in wireshark. I can have a look another day if I manage to record something, or if I can trouble-shoot it by adjusting the libnvme code. |
Thanks for the experiment. I haven't played with RDMA yet. So it was just a guess it might work. Maybe post the output from the discover command with |
ok, interestingly, trying a few times, I have seen that now sometimes I get the
|
ahh maybe it's a buffer alignment problem:
|
That really looks like a buffer alignment issue. RDMA is working on pages (that's the 'dma' bit in RDMA ...), so I would expect it to require the buffer to be page aligned. Hmm. |
Can you try again after updating nvme-cli? |
I tried with the latest nvme-cli. Attached is a log with several attempts. |
The first identify command returns
This is a really funky value. The rest seems to be okay. I don't have a lot experience with RDMA. Are you sure the host and the target can 'talk' to each other correctly? Is there a way to check this? |
Yes, as I wrote in the original post, I can mount one of the 3 disks via rdma, and that disk works normally, and I also get descent 1gb/s transfer rate with that disk.
For the other 2 disks, mounting them fails if I connect via rdma, but I can also read them correctly via dd.
|
I'm hitting a bug which looks very similar as hit. I get it when running a blktests test case with RDMA transport but not with the others transports. This seems to be a kernel thing. Maybe I can shed some light on it with the reproducer in my test bench. |
Until now the funky result values have not correctly propagated. This is way I seem these now as well. Still searching were we do not set the result field correctly. |
Are there any updates on this, can I do anything to help debugging? |
I've got a kernel patch for rdma but haven't found time to post it yet. If you could test this, it would really help.
|
I just tested with your patch (applied to 6.8 kernel / gentoo linux kernel 6.8.7), and used libvnme and nvme-cli from git with master of today. It still doesn't work and fails in the same way before. Here are the logs:
|
Alright, so the
|
I executed the command a couple of times. Most times the result seems correct, but once apparently I got garbage:
|
Either the transport is having troubles or the target implementation is at fault. I think it's more likely that the target is having some sort of race and libnvme is able to hit it very reliable. libnvme is sending two commands back to back and the second gets corrupted data. I suspect from your last test if we would wait a bit between the commands the second command could succeed. Maybe we see something on the ftrace (on the target): # cd /sys/kernel/debug/tracing/
# echo 1 > events/nvmet/enable
# echo 1 > tracing_on
[wait for the failure]
# echo 0 > tracing_on
# cat trace > ~/nvmet-trace.txt and also could you enable the debug logs on the target side? # echo 8 > /proc/sys/kernel/printk
# echo 'file drivers/nvme/target/*.c +p' > /sys/kernel/debug/dynamic_debug/control |
Sure, will do so. But I'll be traveling the next days, so only have access to the system in ~1 week. |
Still interested to get this working? In the meantime the kernel patch has been accepted. So the random |
Sure, I'd like to have this fixed. |
Okay, no worries, I am busy too. I've double checked the commands and on my system they work. So it really sounds like your missing the kernel config options. Are you running a distro kernel? In any case the CONFIG_FTRACE and CONFIG_DYNAMIC_DEBUG options would be necessary. |
I finally found some time to get the traces with the
|
So the |
Just for reference, I had the chance to test with ConnectX6 NICs instead of ConnectX3, and I got the very same problem. Otherwise, didn't have time to investigate further / report. |
Some more info for reference: As I said, I can do I have also ran |
What I am wondering as a side note: Didn't you say the error propagation should be fixed now?
Shouldn't it produce a proper error message meanwhile? Just tried with nvme-cli and libnvme master of today. |
I had to revert those error propagation changes. I understood it wrongly how the error path works. Those changes broke a bunch of stuff. I have to look into this once again, maybe this time I see something. There must be something be very weird about the get log page command if connect just works. Maybe some other admin commands would fail too, e.g. One thing which could be wrong with the
But again, we know that the discover command setup by nvme-cli also fails and sees garbage in the return buffer. I wonder if any of the other admin commands also fail? This could point to the admin handling code in nvmet. Could you try to trigger an error with |
Also can you do a test if |
Thx for the comments, I am traveling and will be back on Tuesday, then I'll try immediately. Yesterday I noticed one more strange thing: |
I repeatedly ran Out of curiosity, then I tried with Trying with I am pasting 2 examples below. Then I tried to trigger an error with nvme id-ctrl, but it seems that never fails. Output is below. Unfortunately, I will not have time for debugging in the next 2 months since I will be away December and January. id-ctrl output:
Examples:
|
Alright, I think with this data, the picture gets a bit clearer. When a 4k block is requested 'the' allocation is aligned, smaller block sizes are likely to not correctly aligned. The only problem is now to figure out where 'the' allocation is not correctly done. I'll think with this data we can start to ask on the nvme mailing list if someone has and idea. Anyway, thanks a lot for the patience to debug this. |
Great, thx for looking into it? Will you ask on the list? As I wrote before, I'll be away until beginning of February with only sporadic mail access.
|
Yes, I'll try to condense all the info we have into something useful and post it on the mailing list. Sure, enjoy the offtime :) |
I am posting here as libnvme issue, but I am not really sure if the problem is in libnvme, nvme-cli, the kernel or somewhere else.
I have 2 computers, pc1 and pc2, connected via ConnectX3 Infiniband adapters, with pc1 mounting 3 remote nvme SSDs from pc2: a samsung 990 Pro 2TB and 2 Corsair MP600 Pro 8TB.
I am having 2 problems:
nvme discover -t rdma -a [IP] -s 4420
, I get the error:nvme connect -n [SUBSYS] -t rdma -a [IP] -s 4420, and then the Samsung 2TB SDD seem to work normally, but I cannot mount a file system from the Corsair SSDs (even though I see them normally in
/proc/partitions`).I tried the following things already:
tcp
type instead ofrdma
in all nvme commands, everything works.nvme discover
.nvme discovery
error: I debugged what happens in libnvme when I get thecould not alloc memory for discovery log page
, and it seems thenumrec
value herelibnvme/src/nvme/fabrics.c
Line 1089 in 691f809
FAT-fs (nvme1n1p1): bogus number of reserved sectors
and from the mount command:wrong fs type, bad option, bad superblock on ..., missing codepage or helper program, or other error
. Mounting the very same file system when connecting viatcp
instead ofrdma
works correctly. I wanted to check if I get corrupted data from this disk viardma
, so I dumped the whole filesystem to an external disk withdd
, and compared what I read when connecting withrdma
and withtcp
, and it is fully identical. So the data I am getting from the disk is correct, mounting fails for a different reason. Sector size and disk size I get when opening the disk withgdisk
are also identical.So in summary, everything works when connecting via
tcp
, but when connecting viardma
thenvme discovery
fails since thenumrec
entry is bogus, and withrdma
I can mount filesystems from the 2TB samsung SSD correctly, but filesystems from the 8TB corsair SSDs fail to mount, despite the data I am reading from the disk withdd
is fully correct.I would be thankful for any advise, or recommendation where to ask. I you'd need me to conduct any tests, please let me know.
The text was updated successfully, but these errors were encountered: