Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to find line: Mapped CXL Memory Device resource #40

Open
LIUQyou opened this issue Oct 6, 2023 · 3 comments
Open

failed to find line: Mapped CXL Memory Device resource #40

LIUQyou opened this issue Oct 6, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@LIUQyou
Copy link

LIUQyou commented Oct 6, 2023

Hi,

I have already built the kernel. And I try to test the system. But I got the following error:

starting qemu. console output is logged to /tmp/rq_0.log
guest will be terminated after 15 minute(s)
+--------------------+
|  CXL Tests - FAIL  |
+--------------------+
failed to find line: Mapped CXL Memory Device resource

I modified the script to output the QEMU command line a little bit:

/mydata/qemu/build/qemu-system-x86_64 -machine q35,accel=kvm,nvdimm=on,cxl=on -m 8192M,slots=4,maxmem=40964M -smp 8,sockets=2,cores=2,threads=2 -enable-kvm -display none -nographic -serial file:/tmp/rq_0.log -drive if=pflash,format=raw,unit=0,file=OVMF_CODE.fd,readonly=on -drive if=pflash,format=raw,unit=1,file=OVMF_VARS.fd -debugcon file:uefi_debug.log -global isa-debugcon.iobase=0x402 -drive file=root.img,format=raw,media=disk -kernel mkosi.extra/boot/vmlinuz-5.19.0 -initrd mkosi.extra/boot/initramfs-5.19.0.img -append selinux=0 audit=0 console=tty0 console=ttyS0 root=/dev/sda2 ignore_loglevel rw memory_hotplug.memmap_on_memory=force cxl_acpi.dyndbg=+fplm cxl_pci.dyndbg=+fplm cxl_core.dyndbg=+fplm cxl_mem.dyndbg=+fplm cxl_pmem.dyndbg=+fplm cxl_port.dyndbg=+fplm cxl_region.dyndbg=+fplm cxl_test.dyndbg=+fplm cxl_mock.dyndbg=+fplm cxl_mock_mem.dyndbg=+fplm memmap=2G!4G efi_fake_mem=2G@6G:0x40000 -device e1000,netdev=net0,mac=52:54:00:12:34:56 -netdev user,id=net0,hostfwd=tcp::10022-:22 -object memory-backend-file,id=cxl-mem0,share=on,mem-path=cxltest0.raw,size=256M -object memory-backend-file,id=cxl-mem1,share=on,mem-path=cxltest1.raw,size=256M -object memory-backend-file,id=cxl-mem2,share=on,mem-path=cxltest2.raw,size=256M -object memory-backend-file,id=cxl-mem3,share=on,mem-path=cxltest3.raw,size=256M -object memory-backend-file,id=cxl-lsa0,share=on,mem-path=lsa0.raw,size=1K -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=lsa1.raw,size=1K -object memory-backend-file,id=cxl-lsa2,share=on,mem-path=lsa2.raw,size=1K -object memory-backend-file,id=cxl-lsa3,share=on,mem-path=lsa3.raw,size=1K -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=53 -device pxb-cxl,id=cxl.1,bus=pcie.0,bus_nr=191 -device cxl-rp,id=hb0rp0,bus=cxl.0,chassis=0,slot=0,port=0 -device cxl-rp,id=hb0rp1,bus=cxl.0,chassis=0,slot=1,port=1 -device cxl-rp,id=hb1rp0,bus=cxl.1,chassis=0,slot=2,port=0 -device cxl-rp,id=hb1rp1,bus=cxl.1,chassis=0,slot=3,port=1 -device cxl-type3,bus=hb0rp0,memdev=cxl-mem0,id=cxl-dev0,lsa=cxl-lsa0 -device cxl-type3,bus=hb0rp1,memdev=cxl-mem1,id=cxl-dev1,lsa=cxl-lsa1 -device cxl-type3,bus=hb1rp0,memdev=cxl-mem2,id=cxl-dev2,lsa=cxl-lsa2 -device cxl-type3,bus=hb1rp1,memdev=cxl-mem3,id=cxl-dev3,lsa=cxl-lsa3 -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=8k,cxl-fmw.1.targets.0=cxl.0,cxl-fmw.1.targets.1=cxl.1,cxl-fmw.1.size=4G,cxl-fmw.1.interleave-granularity=8k -qmp unix:/tmp/run_qemu_qmp_0,server,nowait -snapshot -object memory-backend-ram,id=mem0,size=2048M -numa node,nodeid=0,memdev=mem0, -numa cpu,node-id=0,socket-id=0 -object memory-backend-ram,id=mem1,size=2048M -numa node,nodeid=1,memdev=mem1, -numa cpu,node-id=1,socket-id=1 -object memory-backend-ram,id=mem2,size=2048M -numa node,nodeid=2,memdev=mem2, -object memory-backend-ram,id=mem3,size=2048M -numa node,nodeid=3,memdev=mem3, -numa node,nodeid=4, -object memory-backend-file,id=nvmem0,share=on,mem-path=nvdimm-0,size=16384M,align=1G -device nvdimm,memdev=nvmem0,id=nv0,label-size=2M,node=4 -numa node,nodeid=5, -object memory-backend-file,id=nvmem1,share=on,mem-path=nvdimm-1,size=16384M,align=1G -device nvdimm,memdev=nvmem1,id=nv1,label-size=2M,node=5 -numa dist,src=0,dst=0,val=10 -numa dist,src=0,dst=1,val=21 -numa dist,src=0,dst=2,val=12 -numa dist,src=0,dst=3,val=21 -numa dist,src=0,dst=4,val=17 -numa dist,src=0,dst=5,val=28 -numa dist,src=1,dst=1,val=10 -numa dist,src=1,dst=2,val=21 -numa dist,src=1,dst=3,val=12 -numa dist,src=1,dst=4,val=28 -numa dist,src=1,dst=5,val=17 -numa dist,src=2,dst=2,val=10 -numa dist,src=2,dst=3,val=21 -numa dist,src=2,dst=4,val=28 -numa dist,src=2,dst=5,val=28 -numa dist,src=3,dst=3,val=10 -numa dist,src=3,dst=4,val=28 -numa dist,src=3,dst=5,val=28 -numa dist,src=4,dst=4,val=10 -numa dist,src=4,dst=5,val=28 -numa dist,src=5,dst=5,val=10

Then I checked the log file, it seems no problem:

[  OK  ] Reached target Unmount All Filesystems.
[  OK  ] Closed Network Service Netlink Socket.
[  OK  ] Stopped Apply Kernel Variables.
[  OK  ] Stopped Load Kernel Modules.
[  OK  ] Stopped Create Static Device Nodes in /dev.
[  OK  ] Stopped Create System Users.
[  OK  ] Stopped Remount Root and Kernel File Systems.
[  OK  ] Reached target System Shutdown.
[  OK  ] Reached target Late Shutdown Services.
[  OK  ] Finished System Power Off.
[  OK  ] Reached target System Power Off.
[   10.650377] systemd-shutdown[1]: Syncing filesystems and block devices.
[   10.689438] systemd-shutdown[1]: Sending SIGTERM to remaining processes...
[   10.717339] systemd-journald[449]: Received SIGTERM from PID 1 (systemd-shutdow).
[   10.731498] systemd-shutdown[1]: Sending SIGKILL to remaining processes...
[   10.745673] systemd-shutdown[1]: Unmounting file systems.
[   10.749528] [647]: Remounting '/' read-only in with options '(null)'.
[   10.757674] EXT4-fs (sda2): re-mounted. Quota mode: none.
[   10.763603] systemd-shutdown[1]: All filesystems unmounted.
[   10.765736] systemd-shutdown[1]: Deactivating swaps.
[   10.767481] systemd-shutdown[1]: All swaps deactivated.
[   10.769480] systemd-shutdown[1]: Detaching loop devices.
[   10.774794] systemd-shutdown[1]: All loop devices detached.
[   10.777082] systemd-shutdown[1]: Stopping MD devices.
[   10.779097] systemd-shutdown[1]: All MD devices stopped.
[   10.780462] systemd-shutdown[1]: Detaching DM devices.
[   10.781864] systemd-shutdown[1]: All DM devices detached.
[   10.783232] systemd-shutdown[1]: All filesystems, swaps, loop devices, MD devices and DM devices detached.
[   10.790130] systemd-shutdown[1]: Syncing filesystems and block devices.
[   10.792615] systemd-shutdown[1]: Powering off.
[   10.795976] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[   10.798077] sd 0:0:0:0: [sda] Stopping disk
[   10.981353] ACPI: PM: Preparing to enter system sleep state S5
[   10.982950] reboot: Power down

The kernel I use is 5.19.
The part of the config looks like that:

$ grep -i cxl .config
CONFIG_CXL_BUS=m
CONFIG_CXL_PCI=m
CONFIG_CXL_MEM_RAW_COMMANDS=y
CONFIG_CXL_ACPI=m
CONFIG_CXL_PMEM=m
CONFIG_CXL_MEM=m
CONFIG_CXL_PORT=m
CONFIG_CXL_SUSPEND=y

Is the version of the Linux kernel a problem? The image I tried to build is ubuntu22.04.
Besides, I log into the image without test. The mode used by Pmem is:

root@localhost:~# ndctl list
[
  {
    "dev":"namespace0.0",
    "mode":"raw",
    "size":2147483648,
    "sector_size":512,
    "blockdev":"pmem0"
  }
]

The mode is raw.. I am not sure which part is wrong.
By the way, I made little modification here. I changed the : "${distro:=ubuntu}"
: "${rev:=jammy}"

#!/bin/bash -Ee
# SPDX-License-Identifier: CC0-1.0
# Copyright (C) 2021 Intel Corporation. All rights reserved.

# default config
: "${builddir:=./qbuild}"
rootpw="root"
rootfssize="10G"
nvme_size="1G"
efi_mem_size="2"   #in GiB
legacy_pmem_size="2"   #in GiB
pmem_size="16384"  #in MiB
pmem_label_size=2  #in MiB
pmem_final_size="$((pmem_size + pmem_label_size))"
: "${qemu:=qemu-system-x86_64}"
: "${gdb:=gdb}"
: "${distro:=ubuntu}"
: "${rev:=jammy}"
: "${ndctl:=$(readlink -f ~/git/ndctl)}"
selftests_home=root/built-selftests
mkosi_bin="mkosi"
mkosi_opts=("-i" "-f")

# some canned hmat defaults - make configurable as/when needed
# terminology:
# local = attached directly to the socket in question
# far = memory controller is on 'this' socket, but distinct numa node/pxm domain
# cross = memory controller across sockets
# mem = memory node and pmem = NVDIMM node, as before
# Units: lat(ency) - nanoseconds, bw - MB/s
local_mem_lat=5
local_mem_bw=2000
far_mem_lat=10
far_mem_bw=1500
cross_mem_lat=20
cross_mem_bw=1000
# local_pmem is not a thing. In these configs we always give pmems their own node
far_pmem_lat=30
far_pmem_bw=1000
cross_pmem_lat=40
cross_pmem_bw=500

# similarly, some canned SLIT defaults
local_mem_dist=10
far_mem_dist=12
cross_mem_dist=21
far_pmem_dist=17
cross_pmem_dist=28

# CXL device params
cxl_addr="0x4c00000000"
cxl_backend_size="512M"
cxl_t3_size="256M"
cxl_label_size="1K"

Looking forward to your reply.

Thank you very much

@LIUQyou
Copy link
Author

LIUQyou commented Oct 6, 2023

I switched to kernel 6.3. And then I enable the CONFIGS like what is mentioned in [https://github.com/pmem/ndctl].
The mode is the fsdax. But it still encounter some problems when trying --cxl-test-run.

[  641.072606] RIP: 0033:0x7fa6e331ea3d
[  641.073190] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 a3 0f 00 f7 d8 64 89 01 48
[  641.076135] RSP: 002b:00007ffeafde6028 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
[  641.077340] RAX: ffffffffffffffda RBX: 00007ffeafde60e8 RCX: 00007fa6e331ea3d
[  641.078478] RDX: 00007fa6e32425f3 RSI: 0000000000000000 RDI: 0000000000000011
[  641.079619] RBP: 00007ffeafde60e0 R08: 0000000000000000 R09: 0000000000000006
[  641.080756] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000006
[  641.081894] R13: 0000000000000016 R14: 0000000000000000 R15: 0000000000000000
[  641.083037]  </TASK>
[  641.083405] Modules linked in: nd_pmem dax_cxl dax_pmem nd_btt cxl_mock_mem(ON) cxl_pci nd_e820 cxl_test(ON) cxl_mem(ON) cxl_port(ON) cxl_pmem(ON) cxl_acpi(ON) cxl_mock(ON) libnvdimm cxl_core(ON) efivarfs
[  641.086276] CR2: 00003ae1a4720328
[  641.086833] ---[ end trace 0000000000000000 ]---
[  641.087581] RIP: 0010:_raw_spin_trylock+0xe/0x50
[  641.088329] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 53 48 89 fb bf 01 00 00 00 e8 b2 5a 25 ff <8b> 03 85 c0 75 16 ba 01 00 00 00 f0 0f b1 13 b8 01 00 00 00 75 06
[  641.091274] RSP: 0018:ffffc1154165fc90 EFLAGS: 00010297
[  641.092113] RAX: 0000000000000002 RBX: 0000000000000058 RCX: ffffffff00000000
[  641.093256] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
[  641.094384] RBP: ffff9c8ed0000000 R08: 0000000000000064 R09: 0000000000000000
[  641.095520] R10: ffff9b9f3fe7b000 R11: 0000000a03052ff5 R12: ffff9c8ed0000058
[  641.096654] R13: 0000000000000000 R14: ffff9c8ed0000000 R15: 0000000000000001
[  641.097797] FS:  00007fa6e38c2340(0000) GS:ffff9b9f39c80000(0000) knlGS:0000000000000000
[  641.099076] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  641.100000] CR2: 00003ae1a4720328 CR3: 0000000201116000 CR4: 00000000000006e0
[  641.101140] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  641.102279] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  641.103421] note: systemd[1] exited with irqs disabled
[  641.104267] note: systemd[1] exited with preempt_count 1
[  641.105137] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[  641.106577] Kernel Offset: 0x19200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  641.108343] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 ]---

Could you provide a kernel .config or version example? So that we know which kernel version to use.

Thank you very much

@y17yeshwanth
Copy link

hi, were you able to fix the issue? I am stuck in the similar issue, I have built the kernel but when I do the sanity tests, it fails.

@stellarhopper
Copy link
Member

@LIUQyou @y17yeshwanth Yeah --cxl-test-run and --nfit-test-run are currently broken. I need to either remove them or fix them. For now I'd suggest just booting the guest up normally with --cxl-test or --nfit-test as needed, and running the tests manually (depends on what your goals are with all of this too).

@marc-hb marc-hb added the bug Something isn't working label Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants