Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PVM host kernel panic after restore from snapshot in Cloud Hypervisor on EC2 #7

Open
pojntfx opened this issue Apr 19, 2024 · 11 comments
Assignees
Labels
bug Something isn't working live migration Live migration

Comments

@pojntfx
Copy link

pojntfx commented Apr 19, 2024

Description

We're testing PVM on AWS EC2 and we're running into some issues with restoring snapshots on EC2 (c5.large in particular, but this also occurs on all other non-bare metal instance types - both Intel and AMD). Snapshot restores work well on GCP non-bare metal hosts with Cloud Hypervisor (and on bare metal hosts with the Intel KVM module unloaded and the PVM module loaded), but they lead to host kernel panics when used on AWS EC2. There are also some error messages (see further down) related to it not being able to set MSRs for the guest (a similar error occurs for Firecracker's snapshot restores, too!) in the load step before the kernel crashes.

Note that this only happens when the snapshot is resumed from a different host than the one that it was created on, so when a snapshot is created on host A, moved from host A to host B and resumed on host B. Both instance types in this are the same (same CPU etc.). It does not occur if the snapshot is simply resumed from the same host that it was created on.

Reproduction

This is tested against the latest pvm branch of this repository with the PVM-provided guest kernel config and happens with both of the two following host kernel configs: Fedora's bare metal config and AWS's default config with PVM enabled and loaded for both. We've set up a CI/CD repo here to easily use those kernels on EC2: https://github.com/loopholelabs/linux-pvm-ci

Snapshot/Restore

To reproduce (35.89.175.21 is the first EC2 instance, 34.214.167.180 is the second EC2 host, and rsync is used to sync the snapshot, rootfs etc. between the VMs):

rsync -a --progress /tmp/pvm-experimentation/ [email protected]:/home/fedora/Projects/pvm-experimentation/ # /tmp/pvm-experimentation/ contains the rootfs, PVM guest kernel etc.
curl -Lo ~/Downloads/cloud-hypervisor https://github.com/cloud-hypervisor/cloud-hypervisor/releases/download/v38.0/cloud-hypervisor-static
curl -Lo ~/Downloads/ch-remote https://github.com/cloud-hypervisor/cloud-hypervisor/releases/download/v38.0/ch-remote-static

sudo install ~/Downloads/cloud-hypervisor /usr/bin
sudo install ~/Downloads/ch-remote /usr/bin
cd ~/Projects/pvm-experimentation/ && rm -f /tmp/cloud-hypervisor.sock && cloud-hypervisor \
    --api-socket /tmp/cloud-hypervisor.sock \
    --cpus boot=1,max_phys_bits=43 \
    --memory size=512M \
    --kernel vmlinux \
    --console off \
    --serial tty \
    --cmdline "root=/dev/vda console=ttyS0 rw modules=ext4 rootfstype=ext4 pti=off" \
	--disk path=rootfs.ext4 path=initramfs.img
ch-remote --api-socket=/tmp/cloud-hypervisor.sock pause
rm -rf /home/fedora/Downloads/drafter-snapshots && mkdir -p /home/fedora/Downloads/drafter-snapshots && ch-remote --api-socket=/tmp/cloud-hypervisor.sock snapshot file:///home/fedora/Downloads/drafter-snapshots
pkill cloud-hy
rsync -a --progress [email protected]:/home/fedora/Projects/pvm-experimentation/ /tmp/pvm-experimentation/ && rsync -a --progress [email protected]:/home/fedora/Downloads/drafter-snapshots/ /tmp/drafter-snapshots/ && rsync -a --progress /tmp/pvm-experimentation/ [email protected]:/home/fedora/Projects/pvm-experimentation/ && rsync -a --progress /tmp/drafter-snapshots/ [email protected]:/home/fedora/Downloads/drafter-snapshots/
curl -Lo ~/Downloads/cloud-hypervisor https://github.com/cloud-hypervisor/cloud-hypervisor/releases/download/v38.0/cloud-hypervisor-static
curl -Lo ~/Downloads/ch-remote https://github.com/cloud-hypervisor/cloud-hypervisor/releases/download/v38.0/ch-remote-static

sudo install ~/Downloads/cloud-hypervisor /usr/bin
sudo install ~/Downloads/ch-remote /usr/bin
cd ~/Projects/pvm-experimentation/ && rm -f /tmp/cloud-hypervisor.sock && cloud-hypervisor \
    --api-socket /tmp/cloud-hypervisor.sock \
    --restore source_url=file:///home/fedora/Downloads/drafter-snapshots
ch-remote --api-socket=/tmp/cloud-hypervisor.sock resume

Live Migration

The same issue also occurs with Cloud Hypervisor's live migration:

rsync -a --progress /tmp/pvm-experimentation/ [email protected]:/home/fedora/Projects/pvm-experimentation/
cd ~/Projects/pvm-experimentation/ && rm -f /tmp/cloud-hypervisor.sock && cloud-hypervisor \
    --api-socket /tmp/cloud-hypervisor.sock \
    --cpus boot=1,max_phys_bits=43 \
    --memory size=512M \
    --kernel vmlinux \
    --console off \
    --serial tty \
    --cmdline "root=/dev/vda console=ttyS0 rw modules=ext4 rootfstype=ext4 pti=off" \
	--disk path=rootfs.ext4 path=initramfs.img
cd ~/Projects/pvm-experimentation/ && rm -f /tmp/cloud-hypervisor.sock && cloud-hypervisor --api-socket /tmp/cloud-hypervisor.sock
ch-remote --api-socket=/tmp/cloud-hypervisor.sock receive-migration unix:/tmp/mig
socat TCP-LISTEN:6000,reuseaddr UNIX-CLIENT:/tmp/mig
socat UNIX-LISTEN:/tmp/mig,reuseaddr TCP:34.214.167.180:6000
ch-remote --api-socket=/tmp/cloud-hypervisor.sock send-migration unix:/tmp/mig

Full Kernel Panic

Expand section
[  346.852957] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  346.881359] #PF: supervisor read access in kernel mode
[  346.905711] #PF: error_code(0x0000) - not-present page
[  346.930199] PGD 0 P4D 0 
[  346.941589] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  346.962726] CPU: 0 PID: 1083 Comm: vcpu0 Not tainted 6.7.0-rc6-pvm-host-fedora-baremetal #1
[  347.003980] Hardware name: Amazon EC2 c5.large/, BIOS 1.0 10/16/2017
[  347.033695] RIP: 0010:pvm_vcpu_run+0x415/0x560 [kvm_pvm]
[  347.058894] Code: f8 01 0f 85 ad fe ff ff 48 8b 93 d0 19 00 00 48 8b 83 a8 1a 00 00 80 e6 fd 48 83 bb 18 1b 00 00 00 48 89 93 d0 19 00 00 74 12 <48> 8b 00 25 00 02 00 00 48 09 d0 48 89 83 d0 19 00 00 48 b8 33 00
[  347.151883] RSP: 0000:ffffa0cc01d43d98 EFLAGS: 00010006
[  347.174132] RAX: 0000000000000000 RBX: ffff8bfccae00000 RCX: 000000000000000e
[  347.208690] RDX: 0000000000010012 RSI: fffffe3de5515f58 RDI: fffffe3de5515f58
[  347.242816] RBP: ffffa0cc01d43db0 R08: 0000000000000000 R09: 0000000000000000
[  347.276942] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[  347.312976] R13: 0000000000000001 R14: 0000000000000000 R15: ffff8bfccae00038
[  347.346938] FS:  0000000000000000(0000) GS:ffff8bfcf2600000(0000) knlGS:fffff0001f400000
[  347.387738] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  347.412850] CR2: 0000000000000000 CR3: 0000000104d04005 CR4: 00000000007706f0
[  347.447733] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  347.481654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  347.516084] PKRU: 00000000
[  347.528191] Call Trace:
[  347.539797]  <TASK>
[  347.549621]  ? __die+0x23/0x70
[  347.564375]  ? page_fault_oops+0x16f/0x4e0
[  347.584057]  ? exc_page_fault+0x7e/0x180
[  347.602824]  ? asm_exc_page_fault+0x31/0x50
[  347.622910]  ? pvm_vcpu_run+0x415/0x560 [kvm_pvm]
[  347.645652]  ? pvm_vcpu_run+0x20f/0x560 [kvm_pvm]
[  347.668012]  kvm_arch_vcpu_ioctl_run+0xb48/0x16b0 [kvm]
[  347.693157]  kvm_vcpu_ioctl+0x195/0x700 [kvm]
[  347.713743]  __x64_sys_ioctl+0x97/0xd0
[  347.731614]  do_syscall_64+0x64/0xe0
[  347.748988]  ? exc_page_fault+0x7e/0x180
[  347.767893]  entry_SYSCALL_64_after_hwframe+0x6c/0x74
[  347.792071] RIP: 0033:0x7f9a5cb37d21
[  347.809089] Code: 63 f6 48 8d 44 24 60 48 89 54 24 30 48 89 44 24 10 48 8d 44 24 20 48 89 44 24 18 b8 10 00 00 00 c7 44 24 08 10 00 00 00 0f 05 <48> 63 f8 e8 a7 e8 ff ff 48 83 c4 58 c3 41 55 41 54 49 89 fc 55 53
[  347.904898] RSP: 002b:00007f9a3b9ea4c0 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
[  347.939281] RAX: ffffffffffffffda RBX: 00007f9a3b9ea6c8 RCX: 00007f9a5cb37d21
[  347.972074] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000019
[  348.006023] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffffffff
[  348.040421] R10: 0000000000000000 R11: 0000000000000206 R12: 7fffffffffffffff
[  348.074372] R13: 00007f9a3b9ea778 R14: 000055555628cf30 R15: 00007f9a3b9ea600
[  348.108752]  </TASK>
[  348.118179] Modules linked in: rfkill vfat fat intel_rapl_msr intel_rapl_common intel_uncore_frequency_common isst_if_common nfit libnvdimm snd_pcm rapl snd_timer snd ppdev soundcore parport_pc pcspkr ena i2c_piix4 parport kvm_pvm kvm irqbypass loop fuse zram crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 nvme nvme_core nvme_auth serio_raw
[  348.312194] CR2: 0000000000000000
[  348.323646] ---[ end trace 0000000000000000 ]---
[  348.345924] RIP: 0010:pvm_vcpu_run+0x415/0x560 [kvm_pvm]
[  348.371447] Code: f8 01 0f 85 ad fe ff ff 48 8b 93 d0 19 00 00 48 8b 83 a8 1a 00 00 80 e6 fd 48 83 bb 18 1b 00 00 00 48 89 93 d0 19 00 00 74 12 <48> 8b 00 25 00 02 00 00 48 09 d0 48 89 83 d0 19 00 00 48 b8 33 00
[  348.467224] RSP: 0000:ffffa0cc01d43d98 EFLAGS: 00010006
[  348.488438] RAX: 0000000000000000 RBX: ffff8bfccae00000 RCX: 000000000000000e
[  348.522807] RDX: 0000000000010012 RSI: fffffe3de5515f58 RDI: fffffe3de5515f58
[  348.557155] RBP: ffffa0cc01d43db0 R08: 0000000000000000 R09: 0000000000000000
[  348.591135] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[  348.625656] R13: 0000000000000001 R14: 0000000000000000 R15: ffff8bfccae00038
[  348.659672] FS:  0000000000000000(0000) GS:ffff8bfcf2600000(0000) knlGS:fffff0001f400000
[  348.700064] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  348.725717] CR2: 0000000000000000 CR3: 0000000104d04005 CR4: 00000000007706f0
[  348.760121] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  348.794529] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  348.828520] PKRU: 00000000
[  348.840602] note: vcpu0[1083] exited with irqs disabled
[  348.866241] note: vcpu0[1083] exited with preempt_count 1
[  350.417480] ------------[ cut here ]------------
[  350.435901] Bad FPU state detected at restore_fpregs_from_fpstate+0x46/0xa0, reinitializing FPU registers.
[  350.435912] WARNING: CPU: 0 PID: 778 at arch/x86/mm/extable.c:126 fixup_exception+0x2df/0x310
[  350.525546] Modules linked in: rfkill vfat fat intel_rapl_msr intel_rapl_common intel_uncore_frequency_common isst_if_common nfit libnvdimm snd_pcm rapl snd_timer snd ppdev soundcore parport_pc pcspkr ena i2c_piix4 parport kvm_pvm kvm irqbypass loop fuse zram crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 nvme nvme_core nvme_auth serio_raw
[  350.716715] CPU: 0 PID: 778 Comm: gmain Tainted: G      D            6.7.0-rc6-pvm-host-fedora-baremetal #1
[  350.761717] Hardware name: Amazon EC2 c5.large/, BIOS 1.0 10/16/2017
[  350.790120] RIP: 0010:fixup_exception+0x2df/0x310
[  350.812429] Code: ff ff 31 c9 e9 69 ff ff ff 0f 0b 48 c7 c2 50 fa b8 90 e9 26 ff ff ff 48 c7 c7 b0 73 b1 8f c6 05 8f 92 03 02 01 e8 b1 3d 07 00 <0f> 0b eb ca 0f 0b 48 c7 c2 50 fa b8 90 e9 e5 fd ff ff 83 f8 0c 74
[  350.908252] RSP: 0000:ffffa0cc01d8fcb0 EFLAGS: 00010086
[  350.929556] RAX: 0000000000000000 RBX: ffffffff8fcc7f4c RCX: 0000000000000027
[  350.963955] RDX: ffff8bfcf2620588 RSI: 0000000000000001 RDI: ffff8bfcf2620580
[  350.997973] RBP: 000000000000000d R08: 0000000000000000 R09: ffffa0cc01d8fb38
[  351.032368] R10: 0000000000000003 R11: ffffffff8ff45808 R12: 0000000000000000
[  351.066326] R13: ffffa0cc01d8fda8 R14: 0000000000000000 R15: 0000000000000000
[  351.100720] FS:  00007f9b5f3ff6c0(0000) GS:ffff8bfcf2600000(0000) knlGS:0000000000000000
[  351.141023] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  351.166580] CR2: 0000000000000000 CR3: 0000000102fae002 CR4: 00000000007706f0
[  351.201349] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  351.235368] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  351.269715] PKRU: 55555554
[  351.281440] Call Trace:
[  351.293442]  <TASK>
[  351.303273]  ? fixup_exception+0x2df/0x310
[  351.323342]  ? __warn+0x80/0x130
[  351.338631]  ? fixup_exception+0x2df/0x310
[  351.358310]  ? report_bug+0x171/0x1a0
[  351.375768]  ? console_unlock+0x77/0x120
[  351.394538]  ? handle_bug+0x3c/0x80
[  351.411059]  ? exc_invalid_op+0x17/0x70
[  351.429396]  ? asm_exc_invalid_op+0x25/0x50
[  351.449463]  ? fixup_exception+0x2df/0x310
[  351.469098]  gp_try_fixup_and_notify+0x1e/0xb0
[  351.490472]  exc_general_protection+0x148/0x420
[  351.512321]  asm_exc_general_protection+0x31/0x50
[  351.534622] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  351.561800] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  351.657447] RSP: 0000:ffffa0cc01d8fe58 EFLAGS: 00010046
[  351.678271] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffff8bfcc9db5500
[  351.713043] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffff8bfcc9db5540
[  351.747040] RBP: ffff8bfcc9db54c0 R08: 0000000000000000 R09: 0000000000000000
[  351.781380] R10: 0000000000000001 R11: 0000000000000100 R12: 0000000000000000
[  351.815694] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  351.849656]  switch_fpu_return+0x4f/0xe0
[  351.867976]  exit_to_user_mode_prepare+0x13b/0x1f0
[  351.891179]  syscall_exit_to_user_mode+0x1b/0x40
[  351.913077]  do_syscall_64+0x70/0xe0
[  351.930075]  ? syscall_exit_to_user_mode+0x2b/0x40
[  351.953265]  ? do_syscall_64+0x70/0xe0
[  351.971118]  ? do_syscall_64+0x70/0xe0
[  351.988993]  ? do_syscall_64+0x70/0xe0
[  352.006873]  ? do_syscall_64+0x70/0xe0
[  352.024931]  entry_SYSCALL_64_after_hwframe+0x6c/0x74
[  352.049079] RIP: 0033:0x7f9b608c6b8d
[  352.066035] Code: e5 48 83 ec 20 89 55 ec 48 89 75 f0 48 89 7d f8 e8 98 2e f8 ff 8b 55 ec 48 8b 75 f0 41 89 c0 48 8b 7d f8 b8 07 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2b 44 89 c7 89 45 f8 e8 f0 2e f8 ff 8b 45 f8
[  352.162344] RSP: 002b:00007f9b5f3fdfa0 EFLAGS: 00000293 ORIG_RAX: 0000000000000007
[  352.196837] RAX: 0000000000000000 RBX: 000055a0d99351e0 RCX: 00007f9b608c6b8d
[  352.229686] RDX: 0000000000000f9c RSI: 0000000000000002 RDI: 000055a0d9935360
[  352.263651] RBP: 00007f9b5f3fdfc0 R08: 0000000000000000 R09: 000000007fffffff
[  352.297737] R10: 000055a0d99351e0 R11: 0000000000000293 R12: 000000007fffffff
[  352.332108] R13: 0000000000000f9c R14: 0000000000000002 R15: 000055a0d9935360
[  352.366318]  </TASK>
[  352.376035] ---[ end trace 0000000000000000 ]---
[  352.398636] BUG: TASK stack guard page was hit at 0000000012290d45 (stack is 00000000bb024c83..000000003c5a8904)
[  352.398639] stack guard page: 0000 [#2] PREEMPT SMP NOPTI
[  352.398641] CPU: 0 PID: 778 Comm: gmain Tainted: G      D W          6.7.0-rc6-pvm-host-fedora-baremetal #1
[  352.398643] Hardware name: Amazon EC2 c5.large/, BIOS 1.0 10/16/2017
[  352.398644] RIP: 0010:exc_general_protection+0x23/0x420
[  352.398647] Code: 90 90 90 90 90 90 90 66 0f 1f 00 41 55 41 54 55 48 89 f5 53 48 89 fb 48 83 ec 78 65 48 8b 05 34 c3 01 71 48 89 44 24 70 31 c0 <e8> 48 4a 00 00 41 89 c4 48 b8 67 65 6e 65 72 61 6c 20 48 c7 44 24
[  352.398648] RSP: 0000:ffffa0cc01d8bfc8 EFLAGS: 00010046
[  352.398650] RAX: 0000000000000000 RBX: ffffa0cc01d8c068 RCX: ffffffff8f202277
[  352.398651] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa0cc01d8c068
[  352.398651] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  352.398652] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.398652] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  352.398653] FS:  00007f9b5f3ff6c0(0000) GS:ffff8bfcf2600000(0000) knlGS:0000000000000000
[  352.398654] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  352.398655] CR2: ffffa0cc01d8bfb8 CR3: 0000000102fae002 CR4: 00000000007706f0
[  352.398658] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  352.398659] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  352.398659] PKRU: 55555554
[  352.398660] Call Trace:
[  352.398661]  <#DF>
[  352.398662]  ? die+0x36/0x90
[  352.398666]  ? handle_stack_overflow+0x4d/0x60
[  352.398670]  ? exc_double_fault+0x123/0x1b0
[  352.398671]  ? asm_exc_double_fault+0x23/0x30
[  352.398674]  ? asm_load_gs_index+0x7/0x40
[  352.398677]  ? exc_general_protection+0x23/0x420
[  352.398678]  </#DF>
[  352.398679]  <TASK>
[  352.398679]  asm_exc_general_protection+0x31/0x50
[  352.398681] RIP: 0010:restore_fpregs_from_fpstate+0x42/0xa0
[  352.398684] Code: cc cc cc db e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 <48> 0f c7 1f 48 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae
[  352.398684] RSP: 0000:ffffa0cc01d8c110 EFLAGS: 00010046
[  352.398685] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.398686] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.398687] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.398687] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.398688] R13: ffffa0cc01d8c228 R14: 0000000000000000 R15: 0000000000000000
[  352.398690]  fixup_exception+0x2b2/0x310
[  352.398692]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.398694]  exc_general_protection+0x148/0x420
[  352.398696]  asm_exc_general_protection+0x31/0x50
[  352.398698] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.398699] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.398700] RSP: 0000:ffffa0cc01d8c2d0 EFLAGS: 00010046
[  352.398700] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.398701] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.398702] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.398702] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.398703] R13: ffffa0cc01d8c3e8 R14: 0000000000000000 R15: 0000000000000000
[  352.398704]  fixup_exception+0x2b2/0x310
[  352.398705]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.398707]  exc_general_protection+0x148/0x420
[  352.398709]  asm_exc_general_protection+0x31/0x50
[  352.398711] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.398712] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.398713] RSP: 0000:ffffa0cc01d8c490 EFLAGS: 00010046
[  352.398713] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.398714] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.398714] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.398715] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.398715] R13: ffffa0cc01d8c5a8 R14: 0000000000000000 R15: 0000000000000000
[  352.398716]  fixup_exception+0x2b2/0x310
[  352.398718]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.398719]  exc_general_protection+0x148/0x420
[  352.398721]  asm_exc_general_protection+0x31/0x50
[  352.398723] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.398724] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.398725] RSP: 0000:ffffa0cc01d8c650 EFLAGS: 00010046
[  352.398725] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.398726] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.398726] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.398727] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.398727] R13: ffffa0cc01d8c768 R14: 0000000000000000 R15: 0000000000000000
[  352.398729]  fixup_exception+0x2b2/0x310
[  352.398730]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.398732]  exc_general_protection+0x148/0x420
[  352.398733]  asm_exc_general_protection+0x31/0x50
[  352.398735] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.398736] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.398737] RSP: 0000:ffffa0cc01d8c810 EFLAGS: 00010046
[  352.398738] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.398738] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.398739] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.398739] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.398740] R13: ffffa0cc01d8c928 R14: 0000000000000000 R15: 0000000000000000
[  352.398741]  fixup_exception+0x2b2/0x310
[  352.398742]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.398744]  exc_general_protection+0x148/0x420
[  352.398745]  asm_exc_general_protection+0x31/0x50
[  352.398747] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.398748] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.398749] RSP: 0000:ffffa0cc01d8c9d0 EFLAGS: 00010046
[  352.398750] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.398750] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.398751] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.398752] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.398752] R13: ffffa0cc01d8cae8 R14: 0000000000000000 R15: 0000000000000000
[  352.398754]  fixup_exception+0x2b2/0x310
[  352.398755]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.398757]  exc_general_protection+0x148/0x420
[  352.398758]  asm_exc_general_protection+0x31/0x50
[  352.398760] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.398761] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.398762] RSP: 0000:ffffa0cc01d8cb90 EFLAGS: 00010046
[  352.398762] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.398763] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.398763] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.398764] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.398764] R13: ffffa0cc01d8cca8 R14: 0000000000000000 R15: 0000000000000000
[  352.398766]  fixup_exception+0x2b2/0x310
[  352.398767]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.398768]  exc_general_protection+0x148/0x420
[  352.398770]  asm_exc_general_protection+0x31/0x50
[  352.398772] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.398773] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.398774] RSP: 0000:ffffa0cc01d8cd50 EFLAGS: 00010046
[  352.398774] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.398775] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.398775] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.398776] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.398776] R13: ffffa0cc01d8ce68 R14: 0000000000000000 R15: 0000000000000000
[  352.398777]  fixup_exception+0x2b2/0x310
[  352.398778]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.398780]  exc_general_protection+0x148/0x420
[  352.398782]  asm_exc_general_protection+0x31/0x50
[  352.398784] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.398785] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.398786] RSP: 0000:ffffa0cc01d8cf10 EFLAGS: 00010046
[  352.398786] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.398787] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.398788] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.398788] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.398789] R13: ffffa0cc01d8d028 R14: 0000000000000000 R15: 0000000000000000
[  352.398790]  fixup_exception+0x2b2/0x310
[  352.398791]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.398793]  exc_general_protection+0x148/0x420
[  352.398795]  asm_exc_general_protection+0x31/0x50
[  352.398796] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.398797] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.398798] RSP: 0000:ffffa0cc01d8d0d0 EFLAGS: 00010046
[  352.398799] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.398800] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.398800] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.398801] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.398801] R13: ffffa0cc01d8d1e8 R14: 0000000000000000 R15: 0000000000000000
[  352.398803]  fixup_exception+0x2b2/0x310
[  352.398804]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.398806]  exc_general_protection+0x148/0x420
[  352.398807]  asm_exc_general_protection+0x31/0x50
[  352.398809] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.398810] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.398811] RSP: 0000:ffffa0cc01d8d290 EFLAGS: 00010046
[  352.398811] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.398812] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.398812] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.398813] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.398813] R13: ffffa0cc01d8d3a8 R14: 0000000000000000 R15: 0000000000000000
[  352.398815]  fixup_exception+0x2b2/0x310
[  352.398816]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.398818]  exc_general_protection+0x148/0x420
[  352.398819]  asm_exc_general_protection+0x31/0x50
[  352.398821] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.398822] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.398823] RSP: 0000:ffffa0cc01d8d450 EFLAGS: 00010046
[  352.398823] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.398824] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.398824] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.398825] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.398825] R13: ffffa0cc01d8d568 R14: 0000000000000000 R15: 0000000000000000
[  352.398827]  fixup_exception+0x2b2/0x310
[  352.398828]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.398829]  exc_general_protection+0x148/0x420
[  352.398831]  asm_exc_general_protection+0x31/0x50
[  352.398833] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.398834] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.398835] RSP: 0000:ffffa0cc01d8d610 EFLAGS: 00010046
[  352.398835] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.398836] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.398837] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.398837] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.398838] R13: ffffa0cc01d8d728 R14: 0000000000000000 R15: 0000000000000000
[  352.398841]  fixup_exception+0x2b2/0x310
[  352.398843]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.398845]  exc_general_protection+0x148/0x420
[  352.398848]  asm_exc_general_protection+0x31/0x50
[  352.398851] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.398852] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.398853] RSP: 0000:ffffa0cc01d8d7d0 EFLAGS: 00010046
[  352.398854] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.398854] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.398855] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.398855] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.398855] R13: ffffa0cc01d8d8e8 R14: 0000000000000000 R15: 0000000000000000
[  352.398857]  fixup_exception+0x2b2/0x310
[  352.398858]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.398860]  exc_general_protection+0x148/0x420
[  352.398861]  asm_exc_general_protection+0x31/0x50
[  352.398863] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.398864] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.398865] RSP: 0000:ffffa0cc01d8d990 EFLAGS: 00010046
[  352.398866] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.398866] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.398867] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.398867] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.398867] R13: ffffa0cc01d8daa8 R14: 0000000000000000 R15: 0000000000000000
[  352.398869]  fixup_exception+0x2b2/0x310
[  352.398870]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.398872]  exc_general_protection+0x148/0x420
[  352.398873]  asm_exc_general_protection+0x31/0x50
[  352.398875] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.398876] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.398877] RSP: 0000:ffffa0cc01d8db50 EFLAGS: 00010046
[  352.398877] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.398878] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.398878] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.398879] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.398879] R13: ffffa0cc01d8dc68 R14: 0000000000000000 R15: 0000000000000000
[  352.398881]  fixup_exception+0x2b2/0x310
[  352.398882]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.398884]  exc_general_protection+0x148/0x420
[  352.398885]  asm_exc_general_protection+0x31/0x50
[  352.398887] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.398888] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.398889] RSP: 0000:ffffa0cc01d8dd10 EFLAGS: 00010046
[  352.398889] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.398890] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.398890] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.398891] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.398891] R13: ffffa0cc01d8de28 R14: 0000000000000000 R15: 0000000000000000
[  352.398893]  fixup_exception+0x2b2/0x310
[  352.398894]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.398896]  exc_general_protection+0x148/0x420
[  352.398897]  asm_exc_general_protection+0x31/0x50
[  352.398899] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.398900] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.398901] RSP: 0000:ffffa0cc01d8ded0 EFLAGS: 00010046
[  352.398901] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.398902] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.398902] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.398903] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.398903] R13: ffffa0cc01d8dfe8 R14: 0000000000000000 R15: 0000000000000000
[  352.398905]  fixup_exception+0x2b2/0x310
[  352.398906]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.398907]  exc_general_protection+0x148/0x420
[  352.398909]  asm_exc_general_protection+0x31/0x50
[  352.398911] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.398912] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.398913] RSP: 0000:ffffa0cc01d8e090 EFLAGS: 00010046
[  352.398913] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.398914] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.398914] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.398915] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.398915] R13: ffffa0cc01d8e1a8 R14: 0000000000000000 R15: 0000000000000000
[  352.398917]  fixup_exception+0x2b2/0x310
[  352.398918]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.398919]  exc_general_protection+0x148/0x420
[  352.398921]  asm_exc_general_protection+0x31/0x50
[  352.398923] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.398924] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.398924] RSP: 0000:ffffa0cc01d8e250 EFLAGS: 00010046
[  352.398925] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.398926] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.398926] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.398927] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.398927] R13: ffffa0cc01d8e368 R14: 0000000000000000 R15: 0000000000000000
[  352.398928]  fixup_exception+0x2b2/0x310
[  352.398929]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.398931]  exc_general_protection+0x148/0x420
[  352.398933]  asm_exc_general_protection+0x31/0x50
[  352.398934] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.398936] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.398936] RSP: 0000:ffffa0cc01d8e410 EFLAGS: 00010046
[  352.398937] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.398937] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.398938] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.398938] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.398939] R13: ffffa0cc01d8e528 R14: 0000000000000000 R15: 0000000000000000
[  352.398940]  fixup_exception+0x2b2/0x310
[  352.398941]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.398943]  exc_general_protection+0x148/0x420
[  352.398945]  asm_exc_general_protection+0x31/0x50
[  352.398946] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.398948] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.398948] RSP: 0000:ffffa0cc01d8e5d0 EFLAGS: 00010046
[  352.398949] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.398949] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.398950] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.398950] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.398951] R13: ffffa0cc01d8e6e8 R14: 0000000000000000 R15: 0000000000000000
[  352.398952]  fixup_exception+0x2b2/0x310
[  352.398953]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.398955]  exc_general_protection+0x148/0x420
[  352.398957]  asm_exc_general_protection+0x31/0x50
[  352.398958] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.398960] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.398960] RSP: 0000:ffffa0cc01d8e790 EFLAGS: 00010046
[  352.398961] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.398961] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.398962] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.398962] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.398963] R13: ffffa0cc01d8e8a8 R14: 0000000000000000 R15: 0000000000000000
[  352.398964]  fixup_exception+0x2b2/0x310
[  352.398965]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.398967]  exc_general_protection+0x148/0x420
[  352.398969]  asm_exc_general_protection+0x31/0x50
[  352.398970] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.398972] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.398972] RSP: 0000:ffffa0cc01d8e950 EFLAGS: 00010046
[  352.398973] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.398973] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.398974] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.398974] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.398975] R13: ffffa0cc01d8ea68 R14: 0000000000000000 R15: 0000000000000000
[  352.398976]  fixup_exception+0x2b2/0x310
[  352.398977]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.398979]  exc_general_protection+0x148/0x420
[  352.398981]  asm_exc_general_protection+0x31/0x50
[  352.398982] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.398983] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.398984] RSP: 0000:ffffa0cc01d8eb10 EFLAGS: 00010046
[  352.398985] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.398985] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.398986] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.398986] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.398987] R13: ffffa0cc01d8ec28 R14: 0000000000000000 R15: 0000000000000000
[  352.398988]  fixup_exception+0x2b2/0x310
[  352.398989]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.398991]  exc_general_protection+0x148/0x420
[  352.398993]  asm_exc_general_protection+0x31/0x50
[  352.398994] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.398995] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.398996] RSP: 0000:ffffa0cc01d8ecd0 EFLAGS: 00010046
[  352.398997] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.398997] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.398998] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.398998] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.398999] R13: ffffa0cc01d8ede8 R14: 0000000000000000 R15: 0000000000000000
[  352.399000]  fixup_exception+0x2b2/0x310
[  352.399001]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.399003]  exc_general_protection+0x148/0x420
[  352.399004]  asm_exc_general_protection+0x31/0x50
[  352.399006] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.399007] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.399008] RSP: 0000:ffffa0cc01d8ee90 EFLAGS: 00010046
[  352.399009] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.399009] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.399010] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.399010] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.399010] R13: ffffa0cc01d8efa8 R14: 0000000000000000 R15: 0000000000000000
[  352.399012]  fixup_exception+0x2b2/0x310
[  352.399013]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.399015]  exc_general_protection+0x148/0x420
[  352.399016]  asm_exc_general_protection+0x31/0x50
[  352.399018] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.399019] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.399020] RSP: 0000:ffffa0cc01d8f050 EFLAGS: 00010046
[  352.399020] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.399021] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.399021] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.399022] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.399022] R13: ffffa0cc01d8f168 R14: 0000000000000000 R15: 0000000000000000
[  352.399024]  fixup_exception+0x2b2/0x310
[  352.399025]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.399027]  exc_general_protection+0x148/0x420
[  352.399028]  asm_exc_general_protection+0x31/0x50
[  352.399030] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.399031] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.399032] RSP: 0000:ffffa0cc01d8f210 EFLAGS: 00010046
[  352.399032] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.399033] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.399033] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.399034] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.399034] R13: ffffa0cc01d8f328 R14: 0000000000000000 R15: 0000000000000000
[  352.399036]  fixup_exception+0x2b2/0x310
[  352.399037]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.399038]  exc_general_protection+0x148/0x420
[  352.399040]  asm_exc_general_protection+0x31/0x50
[  352.399042] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.399043] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.399044] RSP: 0000:ffffa0cc01d8f3d0 EFLAGS: 00010046
[  352.399044] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.399045] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.399045] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.399046] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.399046] R13: ffffa0cc01d8f4e8 R14: 0000000000000000 R15: 0000000000000000
[  352.399048]  fixup_exception+0x2b2/0x310
[  352.399049]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.399050]  exc_general_protection+0x148/0x420
[  352.399052]  asm_exc_general_protection+0x31/0x50
[  352.399054] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.399055] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.399056] RSP: 0000:ffffa0cc01d8f590 EFLAGS: 00010046
[  352.399056] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.399057] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.399057] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.399058] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.399058] R13: ffffa0cc01d8f6a8 R14: 0000000000000000 R15: 0000000000000000
[  352.399059]  fixup_exception+0x2b2/0x310
[  352.399060]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.399062]  exc_general_protection+0x148/0x420
[  352.399064]  asm_exc_general_protection+0x31/0x50
[  352.399066] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.399067] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.399067] RSP: 0000:ffffa0cc01d8f750 EFLAGS: 00010046
[  352.399068] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.399069] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.399069] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.399069] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.399070] R13: ffffa0cc01d8f868 R14: 0000000000000000 R15: 0000000000000000
[  352.399071]  fixup_exception+0x2b2/0x310
[  352.399072]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.399074]  exc_general_protection+0x148/0x420
[  352.399076]  asm_exc_general_protection+0x31/0x50
[  352.399077] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.399079] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.399079] RSP: 0000:ffffa0cc01d8f910 EFLAGS: 00010046
[  352.399080] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.399080] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.399081] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.399081] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.399082] R13: ffffa0cc01d8fa28 R14: 0000000000000000 R15: 0000000000000000
[  352.399083]  fixup_exception+0x2b2/0x310
[  352.399084]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.399086]  exc_general_protection+0x148/0x420
[  352.399088]  asm_exc_general_protection+0x31/0x50
[  352.399089] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.399091] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.399091] RSP: 0000:ffffa0cc01d8fad0 EFLAGS: 00010046
[  352.399092] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.399092] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.399093] RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
[  352.399093] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  352.399094] R13: ffffa0cc01d8fbe8 R14: 0000000000000000 R15: 0000000000000000
[  352.399095]  fixup_exception+0x2b2/0x310
[  352.399096]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.399098]  exc_general_protection+0x148/0x420
[  352.399099]  ? fixup_exception+0x2df/0x310
[  352.399100]  asm_exc_general_protection+0x31/0x50
[  352.399102] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.399103] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.399104] RSP: 0000:ffffa0cc01d8fc90 EFLAGS: 00010046
[  352.399104] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffffffff8fbec700
[  352.399105] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffffffff8fbec740
[  352.399105] RBP: 000000000000000d R08: 0000000000000000 R09: ffffa0cc01d8fb38
[  352.399106] R10: 0000000000000003 R11: ffffffff8ff45808 R12: 0000000000000000
[  352.399107] R13: ffffa0cc01d8fda8 R14: 0000000000000000 R15: 0000000000000000
[  352.399108]  fixup_exception+0x2b2/0x310
[  352.399109]  gp_try_fixup_and_notify+0x1e/0xb0
[  352.399111]  exc_general_protection+0x148/0x420
[  352.399112]  asm_exc_general_protection+0x31/0x50
[  352.399114] RIP: 0010:restore_fpregs_from_fpstate+0x46/0xa0
[  352.399115] Code: e2 0f 77 db 04 24 0f 1f 44 00 00 48 8b 0c 24 66 90 48 8b 05 e4 54 ba 01 48 8d 79 40 48 21 d8 48 89 c2 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 5d c3 cc cc cc cc 48 8b 04 24 48 0f ae 48 40 48 83
[  352.399116] RSP: 0000:ffffa0cc01d8fe58 EFLAGS: 00010046
[  352.399117] RAX: 00000000000000ff RBX: 0000000000060cff RCX: ffff8bfcc9db5500
[  352.399117] RDX: 0000000000000000 RSI: 0000000000060cff RDI: ffff8bfcc9db5540
[  352.399118] RBP: ffff8bfcc9db54c0 R08: 0000000000000000 R09: 0000000000000000
[  352.399118] R10: 0000000000000001 R11: 0000000000000100 R12: 0000000000000000
[  352.399119] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  352.399120]  switch_fpu_return+0x4f/0xe0
[  352.399121]  exit_to_user_mode_prepare+0x13b/0x1f0
[  352.399125]  syscall_exit_to_user_mode+0x1b/0x40
[  352.399127]  do_syscall_64+0x70/0xe0
[  352.399129]  ? syscall_exit_to_user_mode+0x2b/0x40
[  352.399131]  ? do_syscall_64+0x70/0xe0
[  352.399133]  ? do_syscall_64+0x70/0xe0
[  352.399134]  ? do_syscall_64+0x70/0xe0
[  352.399136]  ? do_syscall_64+0x70/0xe0
[  352.399137]  entry_SYSCALL_64_after_hwframe+0x6c/0x74
[  352.399139] RIP: 0033:0x7f9b608c6b8d
[  352.399141] Code: e5 48 83 ec 20 89 55 ec 48 89 75 f0 48 89 7d f8 e8 98 2e f8 ff 8b 55 ec 48 8b 75 f0 41 89 c0 48 8b 7d f8 b8 07 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2b 44 89 c7 89 45 f8 e8 f0 2e f8 ff 8b 45 f8
[  352.399141] RSP: 002b:00007f9b5f3fdfa0 EFLAGS: 00000293 ORIG_RAX: 0000000000000007
[  352.399143] RAX: 0000000000000000 RBX: 000055a0d99351e0 RCX: 00007f9b608c6b8d
[  352.399143] RDX: 0000000000000f9c RSI: 0000000000000002 RDI: 000055a0d9935360
[  352.399144] RBP: 00007f9b5f3fdfc0 R08: 0000000000000000 R09: 000000007fffffff
[  352.399144] R10: 000055a0d99351e0 R11: 0000000000000293 R12: 000000007fffffff
[  352.399145] R13: 0000000000000f9c R14: 0000000000000002 R15: 000055a0d9935360
[  352.399146]  </TASK>
[  352.399147] Modules linked in: rfkill vfat fat intel_rapl_msr intel_rapl_common intel_uncore_frequency_common isst_if_common nfit libnvdimm snd_pcm rapl snd_timer snd ppdev soundcore parport_pc pcspkr ena i2c_piix4 parport kvm_pvm kvm irqbypass loop fuse zram crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 nvme nvme_core nvme_auth serio_raw
[  352.399163] ---[ end trace 0000000000000000 ]---
[  352.399164] RIP: 0010:pvm_vcpu_run+0x415/0x560 [kvm_pvm]
[  352.399170] Code: f8 01 0f 85 ad fe ff ff 48 8b 93 d0 19 00 00 48 8b 83 a8 1a 00 00 80 e6 fd 48 83 bb 18 1b 00 00 00 48 89 93 d0 19 00 00 74 12 <48> 8b 00 25 00 02 00 00 48 09 d0 48 89 83 d0 19 00 00 48 b8 33 00
[  352.399171] RSP: 0000:ffffa0cc01d43d98 EFLAGS: 00010006
[  352.399172] RAX: 0000000000000000 RBX: ffff8bfccae00000 RCX: 000000000000000e
[  352.399172] RDX: 0000000000010012 RSI: fffffe3de5515f58 RDI: fffffe3de5515f58
[  352.399173] RBP: ffffa0cc01d43db0 R08: 0000000000000000 R09: 0000000000000000
[  352.399173] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[  352.399174] R13: 0000000000000001 R14: 0000000000000000 R15: ffff8bfccae00038
[  352.399174] FS:  00007f9b5f3ff6c0(0000) GS:ffff8bfcf2600000(0000) knlGS:0000000000000000
[  352.399175] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  352.399176] CR2: ffffa0cc01d8bfb8 CR3: 0000000102fae002 CR4: 00000000007706f0
[  352.399177] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  352.399177] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  352.399178] PKRU: 55555554
[  352.399178] Kernel panic - not syncing: Fatal exception in interrupt
[  352.399367] Kernel Offset: 0xd000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  369.043201] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Cloud Hypervisor Error Log

cloud-hypervisor: 235.709080ms: <vmm> WARN:hypervisor/src/kvm/mod.rs:2113 -- Detected faulty MSR 0x4b564d02 while setting MSRs
cloud-hypervisor: 235.757064ms: <vmm> WARN:hypervisor/src/kvm/mod.rs:2113 -- Detected faulty MSR 0x4b564d04 while setting MSRs
cloud-hypervisor: 235.778630ms: <vmm> WARN:hypervisor/src/kvm/mod.rs:2113 -- Detected faulty MSR 0x4b564df0 while setting MSRs
cloud-hypervisor: 235.800553ms: <vmm> WARN:hypervisor/src/kvm/mod.rs:2113 -- Detected faulty MSR 0x4b564df1 while setting MSRs

Firecracker Error Log

Received Error. Status code: 400 Bad Request. Message: Load snapshot error: Failed to restore from snapshot: Failed to build microVM from snapshot: Failed to restore vCPUs: Failed to run action on vcpu: Failed to set all KVM MSRs for this vCPU. Only a partial write was done.
@bysui bysui added bug Something isn't working live migration Live migration labels Apr 22, 2024
@bysui
Copy link
Collaborator

bysui commented Apr 22, 2024

Hi @pojntfx , Thank you for your CI/CD testing for PVM. We really appreciate it.

Actually, there is a major problem with live migration between different hosts currently. As can be seen from the error log of Cloud Hypervisor, the restoration of 'MSR_PVM_LINEAR_ADDRESS_RANGE' (MSR 0x4b564df0) has failed. The reason behind this is that the hypervisor needs to reserve a range in the vmalloc area. However, different hosts may have different range allocations, leading to the failure of MSR restoration. Since there is not enough space in the 4-level paging mode kernel address space, it becomes challenging to find an available fixed range. Hence, I had to employ a workaround to reserve the same range on the second host in my testing environment.

diff --git a/arch/x86/kvm/pvm/host_mmu.c b/arch/x86/kvm/pvm/host_mmu.c
index 35e97f4f7055..047e7679fe2d 100644
--- a/arch/x86/kvm/pvm/host_mmu.c
+++ b/arch/x86/kvm/pvm/host_mmu.c
@@ -35,8 +35,11 @@ static int __init guest_address_space_init(void)
                return -1;
        }

-       pvm_va_range_l4 = get_vm_area_align(DEFAULT_RANGE_L4_SIZE, PT_L4_SIZE,
-                         VM_ALLOC|VM_NO_GUARD);
+       //pvm_va_range_l4 = get_vm_area_align(DEFAULT_RANGE_L4_SIZE, PT_L4_SIZE,
+       //                VM_ALLOC|VM_NO_GUARD);
+       pvm_va_range_l4 = __get_vm_area_caller(DEFAULT_RANGE_L4_SIZE, VM_ALLOC|VM_NO_GUARD,
+                                              VMALLOC_END - DEFAULT_RANGE_L4_SIZE, VMALLOC_END,
+                                              __builtin_return_address(0));
        if (!pvm_va_range_l4)
                return -1;

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6e4b95f24bd8..bf89f9184b62 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2622,6 +2622,7 @@ struct vm_struct *__get_vm_area_caller(unsigned long size, unsigned long flags,
        return __get_vm_area_node(size, 1, PAGE_SHIFT, flags, start, end,
                                  NUMA_NO_NODE, GFP_KERNEL, caller);
 }
+EXPORT_SYMBOL_GPL(__get_vm_area_caller);

I've tried to use the provided host config file, but I cannot reproduce the issue. Without the workaround, the restoration fails, and a new VM boots instead. However, with the workaround in place, the restoration is successful on the second host.

From the host kernel crash log, it appears that the first problem is due to a NULL pointer access in pvm_vcpu_run(). Could you please provide the 'kvm-pvm.ko' file so that I can examine which variable is being accessed?

Additionally, based on the log, it seems that the issue may be related to XSAVE features. The faulting instruction in restore_fpregs_from_fpstate() is xrestores64, which causes the FPU restore to fail and triggers a #GP exception. Furthermore, in fixup_exception(), there is an attempt to restore the init_fpstate FPU context, which also calls restore_fpregs_from_fpstate(). This results in repeated faults and eventually leads to a task stack overflow. To gather more debugging information, could you please add the following code?

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 29413cb2f090..72b2a0964df8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5517,12 +5517,20 @@ static void kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu,
         */
        u64 supported_xcr0 = vcpu->arch.guest_supported_xcr0 |
                             XFEATURE_MASK_FPSSE;
+       union fpregs_state *ustate = (void *)state;

        if (fpstate_is_confidential(&vcpu->arch.guest_fpu))
                return;

        fpu_copy_guest_fpstate_to_uabi(&vcpu->arch.guest_fpu, state, size,
                                       supported_xcr0, vcpu->arch.pkru);
+
+       pr_info("during getting:\n guest xcr0: %llx, host xcr0: %llx, supported_xcr0: %llx\n",
+               vcpu->arch.xcr0, host_xcr0, supported_xcr0);
+       pr_info("guest pkru: %x, host pkru: %x\n",
+               vcpu->arch.pkru, vcpu->arch.host_pkru);
+       pr_info("xfeatures: %llx, xcomp_bv: %llx\n",
+               ustate->xsave.header.xfeatures, ustate->xsave.header.xcomp_bv);
 }

 static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
@@ -5535,9 +5543,17 @@ static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
 static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
                                        struct kvm_xsave *guest_xsave)
 {
+       union fpregs_state *ustate = (void *)guest_xsave->region;
+
        if (fpstate_is_confidential(&vcpu->arch.guest_fpu))
                return 0;

+       pr_info("during settting:\n guest xcr0: %llx, host xcr0: %llx, supported_xcr0: %llx\n",
+               vcpu->arch.xcr0, host_xcr0, kvm_caps.supported_xcr0);
+       pr_info("guest pkru: %x, host pkru: %x\n",
+               vcpu->arch.pkru, vcpu->arch.host_pkru);
+       pr_info("xfeatures: %llx, xcomp_bv: %llx\n",
+               ustate->xsave.header.xfeatures, ustate->xsave.header.xcomp_bv);
        return fpu_copy_uabi_to_guest_fpstate(&vcpu->arch.guest_fpu,
                                              guest_xsave->region,
                                              kvm_caps.supported_xcr0,
diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index 271dcb2deabc..21403b6e12a6 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -6,6 +6,7 @@
 #include <xen/xen.h>

 #include <asm/fpu/api.h>
+#include <asm/fpu/xcr.h>
 #include <asm/sev.h>
 #include <asm/traps.h>
 #include <asm/kdebug.h>
@@ -121,8 +122,18 @@ static bool ex_handler_sgx(const struct exception_table_entry *fixup,
 static bool ex_handler_fprestore(const struct exception_table_entry *fixup,
                                 struct pt_regs *regs)
 {
+       static bool once;
        regs->ip = ex_fixup_addr(fixup);

+       if (boot_cpu_has(X86_FEATURE_XSAVE) && !once) {
+               struct xregs_state *state = (void *)regs->di;
+
+               once = true;
+               pr_info("xcr0 is %llx\n", xgetbv(XCR_XFEATURE_ENABLED_MASK));
+               pr_info("xfeatures: %llx, xcomp_bv: %llx\n",
+                       state->header.xfeatures, state->header.xcomp_bv);
+       }
+
        WARN_ONCE(1, "Bad FPU state detected at %pB, reinitializing FPU registers.",
                  (void *)instruction_pointer(regs));

pojntfx added a commit to loopholelabs/linux-pvm-ci that referenced this issue Apr 23, 2024
@pojntfx
Copy link
Author

pojntfx commented Apr 24, 2024

Hi! Thanks a lot for your suggestions, sadly it doesn't look like I can't modprobe kvm-pvm with your patches (debugging + the workaround) applied:

$ free -h
               total        used        free      shared  buff/cache   available
Mem:           3.6Gi       396Mi       2.9Gi       620Ki       576Mi       3.2Gi
Swap:          3.6Gi          0B       3.6Gi
$ sudo modprobe kvm-pvm
modprobe: ERROR: could not insert 'kvm_pvm': Cannot allocate memory
# Kernel log
[  510.203704] vmap allocation for size 17592186044416 failed: use vmalloc=<size> to increase size

I've added the patches you've posted above to the CI's AWS configs (see https://github.com/loopholelabs/linux-pvm-ci/blob/master/patches/add-xsave-debug-logs.patch and https://github.com/loopholelabs/linux-pvm-ci/blob/master/patches/use-fixed-pvm-range.patch); to reproduce, run:

uname -r # Get installed kernels - make sure that there is at least one more kernel than the one you're removing!
sudo rpm -e kernel-6.7.0_rc6_pvm_host_fedora_aws-1.x86_64
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
sudo reboot
sudo dnf clean all
sudo dnf upgrade -y --refresh
sudo dnf install -y kernel-6.7.0_rc6_pvm_host_fedora_aws-1.x86_64
sudo grubby --set-default /boot/vmlinuz-6.7.0-rc6-pvm-host-fedora-aws
sudo grubby --args="pti=off nokaslr lapic=notscdeadline" --update-kernel /boot/vmlinuz-6.7.0-rc6-pvm-host-fedora-aws
sudo tee /etc/modprobe.d/kvm-intel-amd-blacklist.conf <<EOF
blacklist kvm-intel
blacklist kvm-amd
EOF
echo "kvm-pvm" | sudo tee /etc/modules-load.d/kvm-pvm.conf
sudo reboot

I've also tried it with a larger (512M) vmalloc value, which doesn't seem to change anything (if I read the kernel logs correctly it's trying to allocate a much larger region than that anyways):

$ cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.7.0-rc6-pvm-host-fedora-aws root=UUID=2f1b4fb2-54ec-4124-a3b4-1776614e5e4c ro rootflags=subvol=root no_timer_check net.ifnames=0 console=tty1 console=ttyS0,115200n8 pti=off vmalloc=512M

Let me know if there is anything I can do to help debugging!

@bysui
Copy link
Collaborator

bysui commented Apr 26, 2024

Hi! Thanks a lot for your suggestions, sadly it doesn't look like I can't modprobe kvm-pvm with your patches (debugging + the workaround) applied:

$ free -h
               total        used        free      shared  buff/cache   available
Mem:           3.6Gi       396Mi       2.9Gi       620Ki       576Mi       3.2Gi
Swap:          3.6Gi          0B       3.6Gi
$ sudo modprobe kvm-pvm
modprobe: ERROR: could not insert 'kvm_pvm': Cannot allocate memory
# Kernel log
[  510.203704] vmap allocation for size 17592186044416 failed: use vmalloc=<size> to increase size

I've added the patches you've posted above to the CI's AWS configs (see https://github.com/loopholelabs/linux-pvm-ci/blob/master/patches/add-xsave-debug-logs.patch and https://github.com/loopholelabs/linux-pvm-ci/blob/master/patches/use-fixed-pvm-range.patch); to reproduce, run:

uname -r # Get installed kernels - make sure that there is at least one more kernel than the one you're removing!
sudo rpm -e kernel-6.7.0_rc6_pvm_host_fedora_baremetal-1.x86_64
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
sudo reboot
sudo dnf clean all
sudo dnf upgrade --refresh
sudo dnf install -y kernel-6.7.0_rc6_pvm_host_fedora_aws-1.x86_64
sudo grubby --set-default /boot/vmlinuz-6.7.0-rc6-pvm-host-fedora-aws
sudo grubby --args="pti=off" --update-kernel /boot/vmlinuz-6.7.0-rc6-pvm-host-fedora-aws
sudo tee /etc/modprobe.d/kvm-intel-amd-blacklist.conf <<EOF
blacklist kvm-intel
blacklist kvm-amd
EOF
echo "kvm-pvm" | sudo tee /etc/modules-load.d/kvm-pvm.conf
sudo reboot

I've also tried it with a larger (512M) vmalloc value, which doesn't seem to change anything (if I read the kernel logs correctly it's trying to allocate a much larger region than that anyways):

$ cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.7.0-rc6-pvm-host-fedora-aws root=UUID=2f1b4fb2-54ec-4124-a3b4-1776614e5e4c ro rootflags=subvol=root no_timer_check net.ifnames=0 console=tty1 console=ttyS0,115200n8 pti=off vmalloc=512M

Let me know if there is anything I can do to help debugging!

Sorry, I made a mistake that I was mixed up when I wrote the reply. The workaround works in my old kernel version. I forgot that percpu allocation in the vmalloc area occurs from top to bottom in the current kernel version, so the __get_vm_area_caller() will fail.

The correct workaround can look like this, based on the latest 'pvm' branch with 5-level paging mode support. Additionally, you have to disable KASLR for the host kernel by adding "nokaslr" to the boot command. I have tested it for the current kernel version. Note that this is just a workaround for snapshot save/restore between different hosts (even with the same host but different allowed range by loading/unloading modules).

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 1526747bedf2..0a0a13784403 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -713,6 +713,27 @@ static void __init x86_report_nx(void)
        }
 }

+#ifdef CONFIG_X86_64
+static void __init x86_reserve_vmalloc_range(void)
+{
+       static struct vm_struct pvm;
+       unsigned long size = 32UL << 39;
+
+       if (pgtable_l5_enabled())
+               size = 32UL << 48;
+
+       pvm.addr = (void *)(VMALLOC_END + 1 - size);
+       pvm.size = size;
+       pvm.flags = VM_ALLOC | VM_NO_GUARD;
+
+       vm_area_add_early(&pvm);
+}
+#else
+static void __init x86_reserve_vmalloc_range(void)
+{
+}
+#endif
+
 /*
  * Determine if we were loaded by an EFI loader.  If so, then we have also been
  * passed the efi memmap, systab, etc., so we should use these data structures
@@ -955,6 +976,7 @@ void __init setup_arch(char **cmdline_p)
         * defined and before each memory section base is used.
         */
        kernel_randomize_memory();
+       x86_reserve_vmalloc_range();

 #ifdef CONFIG_X86_32
        /* max_low_pfn get updated here */
diff --git a/arch/x86/kvm/pvm/host_mmu.c b/arch/x86/kvm/pvm/host_mmu.c
index a60a7c78ca5a..3bda09f1de69 100644
--- a/arch/x86/kvm/pvm/host_mmu.c
+++ b/arch/x86/kvm/pvm/host_mmu.c
@@ -51,9 +51,8 @@ static int __init guest_address_space_init(void)
                pml4_index_start = L4_PT_INDEX(PVM_GUEST_MAPPING_START);
                pml4_index_end = L4_PT_INDEX(RAW_CPU_ENTRY_AREA_BASE);

-               pvm_va_range = get_vm_area_align(DEFAULT_RANGE_L5_SIZE, PT_L5_SIZE,
-                                                VM_ALLOC|VM_NO_GUARD);
-               if (!pvm_va_range) {
+               pvm_va_range = find_vm_area((void *)(VMALLOC_END + 1 - DEFAULT_RANGE_L5_SIZE));
+               if (!pvm_va_range || pvm_va_range->size != DEFAULT_RANGE_L5_SIZE) {
                        pml5_index_start = 0x1ff;
                        pml5_index_end = 0x1ff;
                } else {
@@ -62,9 +61,8 @@ static int __init guest_address_space_init(void)
                                                     (u64)pvm_va_range->size);
                }
        } else {
-               pvm_va_range = get_vm_area_align(DEFAULT_RANGE_L4_SIZE, PT_L4_SIZE,
-                                                VM_ALLOC|VM_NO_GUARD);
-               if (!pvm_va_range)
+               pvm_va_range = find_vm_area((void *)(VMALLOC_END + 1 - DEFAULT_RANGE_L4_SIZE));
+               if (!pvm_va_range || pvm_va_range->size != DEFAULT_RANGE_L4_SIZE)
                        return -1;

                pml4_index_start = L4_PT_INDEX((u64)pvm_va_range->addr);
@@ -133,8 +131,6 @@ int __init host_mmu_init(void)

 void host_mmu_destroy(void)
 {
-       if (pvm_va_range)
-               free_vm_area(pvm_va_range);
        if (host_mmu_root_pgd)
                free_page((unsigned long)(void *)host_mmu_root_pgd);
        if (host_mmu_la57_top_p4d)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6e4b95f24bd8..3fead6a4f5c9 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2680,6 +2680,7 @@ struct vm_struct *find_vm_area(const void *addr)

        return va->vm;
 }
+EXPORT_SYMBOL_GPL(find_vm_area);

After further investigation, I was able to reproduce the NULL pointer access in pvm_vcpu_run() in my testing environment. The access address is pvm->pvcs_gpc.khva, and due to the NULL pointer access, the host xsave state is not restored, resulting in the error message about xrestore64 failure.

The main problem still lies in the failed restoration of MSR_PVM_VCPU_STRUCT. The fix for #2 is not sufficient and, moreover, it allows the current problem. Can you try the following patch to see if my assumption is correct? It works in my testing environment.

diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c
index 466f989cbcc3..2c83bb3251b6 100644
--- a/arch/x86/kvm/pvm/pvm.c
+++ b/arch/x86/kvm/pvm/pvm.c
@@ -1193,10 +1193,13 @@ static int pvm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
                 * user memory region before the VM entry.
                 */
                pvm->msr_vcpu_struct = data;
-               if (!data)
+               if (!data) {
                        kvm_gpc_deactivate(&pvm->pvcs_gpc);
-               else if (kvm_gpc_activate(&pvm->pvcs_gpc, data, PAGE_SIZE))
+               } else if (kvm_gpc_activate(&pvm->pvcs_gpc, data, PAGE_SIZE)) {
+                       if (msr_info->host_initiated)
+                               kvm_make_request(KVM_REQ_GPC_REFRESH, vcpu);
                        return 1;
+               }
                break;
        case MSR_PVM_SUPERVISOR_RSP:
                pvm->msr_supervisor_rsp = msr_info->data;

@bysui bysui self-assigned this Apr 26, 2024
pojntfx added a commit to loopholelabs/linux-pvm-ci that referenced this issue Apr 26, 2024
@pojntfx
Copy link
Author

pojntfx commented Apr 26, 2024

Happy to report that this workaround fixed it! We just managed to snapshot/restore across two different EC2 instances with your two patches applied (we had to make some minor adjustments to get them to work around some syntax issues: https://github.com/loopholelabs/linux-pvm-ci/blob/master/patches/use-fixed-pvm-range.patch and https://github.com/loopholelabs/linux-pvm-ci/blob/master/patches/fix-xsave-restore.patch). No FPU bugs or kernel crashes happened on the guest or the host :) Demo video:

pvm-ec2-migration.mp4

@pojntfx
Copy link
Author

pojntfx commented Apr 26, 2024

Note

EDIT: Not an issue with PVM - is actually an issue with Cloud Hypervisor not being able to mask certain CPU features that aren't available on both hosts, it works fine with our fork of Firecracker - see #7 (comment) for more information and disregard this comment

I also just tested a migration between two separate clouds (AWS → GCP), and it looks like the issue still persists there; set_xsave fails, so does set_xcrs. Neither hosts have 5-level paging enabled - any idea what might be going on here? If I comment out the set_xsave and set_xcrs functions in Cloud Hypervisor, it starts a new VM instead of re-starting from the old state. I've tried the patch you've posted (the one for pvm_set_msr), while it does get EC2-internal migrations to work, migrations outside of EC2 still have the same issue.

Here is the error log from cloud-hypervisor (without commenting out set_xsave and set_xcrs ofc):

$ cd ~/Projects/pvm-experimentation/ && rm -f /tmp/cloud-hypervisor.sock && cloud-hypervisor     --api-socket /tmp/cloud-hypervisor.sock     --restore source_url=file:///home/pojntfx/Downloads/drafter-snapshots
cloud-hypervisor: 11.644681ms: <vmm> ERROR:arch/src/x86_64/mod.rs:558 -- Detected incompatible CPUID entry: leaf=0x7 (subleaf=0x0), register='EBX', compatilbe_check='BitwiseSubset', source VM feature='0xd18f072b', destination VM feature'0xc2f7b'.
Error restoring VM: VmRestore(CpuManager(VcpuCreate(Could not set the vCPU state SetXsaveState(Invalid argument (os error 22)))))

CPU infos for the two tested hosts:

# EC2
$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  2
  On-line CPU(s) list:   0,1
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz
    CPU family:          6
    Model:               85
    Thread(s) per core:  2
    Core(s) per socket:  1
    Socket(s):           1
    Stepping:            7
    BogoMIPS:            5999.99
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid
                          aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase t
                         sc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke
Virtualization features: 
  Hypervisor vendor:     KVM
  Virtualization type:   full
Caches (sum of all):     
  L1d:                   32 KiB (1 instance)
  L1i:                   32 KiB (1 instance)
  L2:                    1 MiB (1 instance)
  L3:                    35.8 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0,1
Vulnerabilities:         
  Gather data sampling:  Unknown: Dependent on hypervisor status
  Itlb multihit:         KVM: Mitigation: VMX unsupported
  L1tf:                  Mitigation; PTE Inversion
  Mds:                   Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
  Meltdown:              Vulnerable
  Mmio stale data:       Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
  Retbleed:              Vulnerable
  Spec rstack overflow:  Not affected
  Spec store bypass:     Vulnerable
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected
$ cat /proc/cmdline 
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.7.0-rc6-pvm-host-fedora-aws root=UUID=2f1b4fb2-54ec-4124-a3b4-1776614e5e4c ro rootflags=subvol=root no_timer_check net.ifnames=0 console=tty1 console=ttyS0,115200n8 pti=off nokaslr
  
  # GCP
  $ lscpu 
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  1
  On-line CPU(s) list:   0
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) CPU @ 2.20GHz
    CPU family:          6
    Model:               79
    Thread(s) per core:  1
    Core(s) per socket:  1
    Socket(s):           1
    Stepping:            0
    BogoMIPS:            4400.29
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid
                          tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp fsgsbase tsc_adjust
                          bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat md_clear arch_capabilities
Virtualization features: 
  Hypervisor vendor:     KVM
  Virtualization type:   full
Caches (sum of all):     
  L1d:                   32 KiB (1 instance)
  L1i:                   32 KiB (1 instance)
  L2:                    256 KiB (1 instance)
  L3:                    55 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0
Vulnerabilities:         
  Gather data sampling:  Not affected
  Itlb multihit:         Not affected
  L1tf:                  Mitigation; PTE Inversion
  Mds:                   Mitigation; Clear CPU buffers; SMT Host state unknown
  Meltdown:              Vulnerable
  Mmio stale data:       Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
  Retbleed:              Mitigation; IBRS
  Spec rstack overflow:  Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; IBRS, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Mitigation; Clear CPU buffers; SMT Host state unknown
$ cat /proc/cmdline 
BOOT_IMAGE=(hd0,gpt2)/boot/vmlinuz-6.7.0-rc6-pvm-host-rocky-gcp root=UUID=fe4bce20-90c9-4d54-8d00-70e98ca7a7ac ro net.ifnames=0 biosdevname=0 scsi_mod.use_blk_mq=Y crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M console=ttyS0,115200 pti=off nokaslr

I also disabled the CPU compatibility check in Cloud Hypervisor because that failed:

diff --git a/arch/src/x86_64/mod.rs b/arch/src/x86_64/mod.rs
index 896a74d2..392e78a5 100644
--- a/arch/src/x86_64/mod.rs
+++ b/arch/src/x86_64/mod.rs
@@ -568,10 +568,9 @@ impl CpuidFeatureEntry {

       if compatible {
           info!("No CPU incompatibility detected.");
-            Ok(())
-        } else {
-            Err(Error::CpuidCheckCompatibility)
       }
+
+        Ok(())
   }
}

dmesg -w doesn't show any errors.

@pojntfx
Copy link
Author

pojntfx commented Apr 27, 2024

Turns out #7 (comment) wasn't caused by anything PVM related - it's just that restores fail due to different CPU features (avx512 etc.) being available on the source host of a migration but not on the destination end. Using Firecracker and it's CPU templates makes it possible to reliably resume snapshots across GCP and EC2/completely different CPUs (by masking the CPU features that they don't have in common). We've tested it successfully with the T2 and T2CL CPU templates.

We've also been able to revert this part of the patches you've posted (no difference in resume behavior): loopholelabs/linux-pvm-ci@d28ceb1 - is there a chance that this might be an accidental addition? It looks like a memory leak to us. We've updated the CI repo (https://github.com/loopholelabs/linux-pvm-ci) to reflect this change already.

One error we still run into however (with & without reverting this change) is that after a few migrations between hosts (with this specific setup, ~5-7 migrations) VMs start resuming significantly more slowly (sometimes it takes up to a minute). Rebooting the hosts & doing a migration afterwards makes them resume immediately again, even if it's the same snapshot - is there a chance there might be some sort of memory leak etc. that's making PVM not find the necessary memory regions after multiple VMs have been resumed on one host? We're not resuming multiple VMs at the same time, just one after the other (stopping the last one before starting a new migration), yet this still happens.

Let me know if there is anything I can do to help debug this!

@bysui
Copy link
Collaborator

bysui commented Apr 28, 2024

Turns out #7 (comment) wasn't caused by anything PVM related - it's just that restores fail due to different CPU features (avx512 etc.) being available on the source host of a migration but not on the destination end. Using Firecracker and it's CPU templates makes it possible to reliably resume snapshots across GCP and EC2/completely different CPUs (by masking the CPU features that they don't have in common). We've tested it successfully with the T2 and T2CL CPU templates.

We've also been able to revert this part of the patches you've posted (no difference in resume behavior): loopholelabs/linux-pvm-ci@d28ceb1 - is there a chance that this might be an accidental addition? It looks like a memory leak to us. We've updated the CI repo (https://github.com/loopholelabs/linux-pvm-ci) to reflect this change already.

No, it was deleted deliberately. As I mentioned earlier, percpu allocation in the VMALLOC area occurs from top to bottom, so it cannot guarantee the allocation of a fixed area after booting. Therefore, I choose to reserve an area during booting (before the percpu area initialization). PVM will check whether the expected area exists during module loading. As you can see, I define the 'pvm' variable in x86_reserve_vmalloc_range() as a static variable, so it shouldn't be freed. You will receive a warning if you attempt to free the area when unloading the PVM module. Furthermore, you won't be able to load the PVM module again as the area has not been reserved. Additionally, using this method is also suitable for allowing multiple PVM module instances to coexist (e.g., for live upgrade PVM module).

We will make this behavior a kernel booting parameter. If you want to use PVM, you should pass the new parameter to reserve the fixed area in the VMALLOC area (furthermore, disable VMALLOC area randomization if you want to use migration for PVM). I will discuss the details with my colleague. Also, if you have any preferences or concerns about convenience, please feel free to share them.

One error we still run into however (with & without reverting this change) is that after a few migrations between hosts (with this specific setup, ~5-7 migrations) VMs start resuming significantly more slowly (sometimes it takes up to a minute). Rebooting the hosts & doing a migration afterwards makes them resume immediately again, even if it's the same snapshot - is there a chance there might be some sort of memory leak etc. that's making PVM not find the necessary memory regions after multiple VMs have been resumed on one host? We're not resuming multiple VMs at the same time, just one after the other (stopping the last one before starting a new migration), yet this still happens.

Let me know if there is anything I can do to help debug this!

Sorry, I'm not clear about the problem. Are you saying that there are only two hosts involved, or more? Assuming there are two hosts, we'll call them host1 and host2. Is the migration always from host1 to host2, or is it bidirectional (host1 <-> host2)? Additionally, does rebooting one or both hosts solve the problem? Have you tried unloading and reloading the PVM modules? Did you encounter any warnings or errors during the unloading? Does the problem persist after reloading the module? Perhaps we can open a new issue to track it more clearly.

bysui added a commit that referenced this issue Apr 29, 2024
The commit eb49d06 ("KVM: x86/PVM: Store the valid value for
MSR_PVM_VCPU_STRUCT unconditionally") aimed to address the failure to
restore a snapshot due to the MSR_PVM_VCPU_STRUCT restoration failure by
storing the value before kvm_gpc_activate(). However, this fix worked
accidentally as the GPC is refreshed by timer IRQ handling instead of
adding memslot. If there is no timer IRQ injecting before the first VM
entry, it will cause the host to panic due to the NULL pointer access of
'pvcs_gpc.khva'. Therefore, refer to the PVM specification, a GPC
refresh request is made if the GPC fails to activate during the MSR
setting by the host.  For the guest, setting an invalid MSR value will
trigger a triple fault. Additionally, a WARN_ON_ONCE() is added in
pvm_vcpu_run() to capture unexpected bugs if 'pvcs_gpc.khva' is NULL and
MSR value is not NULL.

Fixes: eb49d06 ("KVM: x86/PVM: Store the valid value for MSR_PVM_VCPU_STRUCT unconditionally")
Signed-off-by: Hou Wenlong <[email protected]>
Link: #7
@pojntfx
Copy link
Author

pojntfx commented May 6, 2024

Sorry for the delay in my response, I was OOO for significant parts of last week due to medical reasons.

No, it was deleted deliberately. As I mentioned earlier, percpu allocation in the VMALLOC area occurs from top to bottom, so it cannot guarantee the allocation of a fixed area after booting. Therefore, I choose to reserve an area during booting (before the percpu area initialization). PVM will check whether the expected area exists during module loading. As you can see, I define the 'pvm' variable in x86_reserve_vmalloc_range() as a static variable, so it shouldn't be freed. You will receive a warning if you attempt to free the area when unloading the PVM module. Furthermore, you won't be able to load the PVM module again as the area has not been reserved. Additionally, using this method is also suitable for allowing multiple PVM module instances to coexist (e.g., for live upgrade PVM module).

Thanks a lot! Yes, this makes a lot of sense; a kernel parameter seems like a good way to implement this behavior. From a user perspective, when using migrations, would this mean that the memory region that PVM can use/that VMs can use would need to be reserved ahead of time when loading the KVM module? Would the size of this region also be configurable through this kernel parameter, and would there be limits as to how much memory we could reserve?

Sorry, I'm not clear about the problem. Are you saying that there are only two hosts involved, or more? Assuming there are two hosts, we'll call them host1 and host2. Is the migration always from host1 to host2, or is it bidirectional (host1 <-> host2)? Additionally, does rebooting one or both hosts solve the problem? Have you tried unloading and reloading the PVM modules? Did you encounter any warnings or errors during the unloading? Does the problem persist after reloading the module? Perhaps we can open a new issue to track it more clearly.

This happens when migrating from host 1 to host 2, as well as from host 2 to host 1 - it's bi-directional. Let's say we migrate from host 1 to host 2 - the first migrations works flawlessly. When we stop this migrated VM on host 2, and we migrate the VM from host 1 to host 2 again, the restore takes much longer. If we do this ~5 times, the VM doesn't resume at all/hangs. In this case, if we reboot host 2, and then migrate the VM from host 1 to host 2 again, it resumes again immediately - and then after ~5 times it stops working again. Unloading and then reloading the module has the same effect, migrations start working again after this. So far we weren't able to get this to reproduce on migrations between same instance types (like say two EC2 instances of the same type), but it happens when we migrate between two different instance types (like say EC2 → GCP) - even if it's the same Intel CPU generation. We can't reproduce this with Cloud Hypervisor since we can't resume the VM at all on another host (because it lacks the concept of CPU templates), but we can do so with Firecracker reliably.

Any idea what might be causing this? Let me know if you need additional docs to reproduce or if I can help with additional debugging info.

@bysui
Copy link
Collaborator

bysui commented May 9, 2024

Sorry for the delay in my response, I was OOO for significant parts of last week due to medical reasons.

I'm sorry to hear that. I hope you are feeling better now.

No, it was deleted deliberately. As I mentioned earlier, percpu allocation in the VMALLOC area occurs from top to bottom, so it cannot guarantee the allocation of a fixed area after booting. Therefore, I choose to reserve an area during booting (before the percpu area initialization). PVM will check whether the expected area exists during module loading. As you can see, I define the 'pvm' variable in x86_reserve_vmalloc_range() as a static variable, so it shouldn't be freed. You will receive a warning if you attempt to free the area when unloading the PVM module. Furthermore, you won't be able to load the PVM module again as the area has not been reserved. Additionally, using this method is also suitable for allowing multiple PVM module instances to coexist (e.g., for live upgrade PVM module).

Thanks a lot! Yes, this makes a lot of sense; a kernel parameter seems like a good way to implement this behavior. From a user perspective, when using migrations, would this mean that the memory region that PVM can use/that VMs can use would need to be reserved ahead of time when loading the KVM module? Would the size of this region also be configurable through this kernel parameter, and would there be limits as to how much memory we could reserve?

Yes, if someone wants to use migrations, they must ensure that the kernel parameter is set and the range is reserved during booting. The PVM module will try to find the reserved range first; if it doesn't, then it will attempt dynamic allocation and may trigger a warning indicating that migration may not be available. Regarding the size of the region, my colleague suggested that we may use the same format as crashkernel (kexec), which is offset + size. However, I think this may be a little over-designed because the region is a virtual address range instead of physical address memory, and the user may not actually care about it. Different offset + size on different hosts could cause migration to fail. What do you think?

Sorry, I'm not clear about the problem. Are you saying that there are only two hosts involved, or more? Assuming there are two hosts, we'll call them host1 and host2. Is the migration always from host1 to host2, or is it bidirectional (host1 <-> host2)? Additionally, does rebooting one or both hosts solve the problem? Have you tried unloading and reloading the PVM modules? Did you encounter any warnings or errors during the unloading? Does the problem persist after reloading the module? Perhaps we can open a new issue to track it more clearly.

This happens when migrating from host 1 to host 2, as well as from host 2 to host 1 - it's bi-directional. Let's say we migrate from host 1 to host 2 - the first migrations works flawlessly. When we stop this migrated VM on host 2, and we migrate the VM from host 1 to host 2 again, the restore takes much longer. If we do this ~5 times, the VM doesn't resume at all/hangs. In this case, if we reboot host 2, and then migrate the VM from host 1 to host 2 again, it resumes again immediately - and then after ~5 times it stops working again. Unloading and then reloading the module has the same effect, migrations start working again after this. So far we weren't able to get this to reproduce on migrations between same instance types (like say two EC2 instances of the same type), but it happens when we migrate between two different instance types (like say EC2 → GCP) - even if it's the same Intel CPU generation. We can't reproduce this with Cloud Hypervisor since we can't resume the VM at all on another host (because it lacks the concept of CPU templates), but we can do so with Firecracker reliably.

Any idea what might be causing this? Let me know if you need additional docs to reproduce or if I can help with additional debugging info.

Sorry, I don't have ECS/GCP instances, but I will try to see if I can reproduce the problem on my physical machine. However, I have a few more questions. When you say 'stop the migrated VM', do you mean pausing the VM or shutting it down? Are the vCPUs still running when the restored VM is hanging? You can use perf tools to trace the kvm_exit event on the target host:

perf record -a -e kvm:kvm_exit
perf script

@pojntfx
Copy link
Author

pojntfx commented Jun 6, 2024

Just a quick update from our side - sorry for the delay. We're still gathering more info on this, but right now we're blocked by work on other parts of the live migration implementation data plane before we can re-test.

Regarding this:

However, I think this may be a little over-designed because the region is a virtual address range instead of physical address memory, and the user may not actually care about it. Different offset + size on different hosts could cause migration to fail. What do you think?

Do I understand correctly that for migration to work, the kernel module would need to be loaded with the same offset + size parameters on all hosts where a VM snapshot could be resumed? How would this work with multiple VMs running on a single PVM kernel module instance?

As for the testing with perf, I'll get to it as soon as the remaining work on the live migration data plane is finished ^^

@bysui
Copy link
Collaborator

bysui commented Jun 11, 2024

Just a quick update from our side - sorry for the delay. We're still gathering more info on this, but right now we're blocked by work on other parts of the live migration implementation data plane before we can re-test.

Regarding this:

However, I think this may be a little over-designed because the region is a virtual address range instead of physical address memory, and the user may not actually care about it. Different offset + size on different hosts could cause migration to fail. What do you think?

Do I understand correctly that for migration to work, the kernel module would need to be loaded with the same offset + size parameters on all hosts where a VM snapshot could be resumed? How would this work with multiple VMs running on a single PVM kernel module instance?

Yes, you are correct. However, the parameters referred to here are the host kernel boot parameters, not the PVM kernel module parameters. My colleague suggested that it's the administrator's responsibility to provide suitable offset + size kernel boot parameters on all hosts if they want to allow migration between those hosts. :(

The offset + size can be larger than the actual size needed by the PVM kernel module, and the PVM kernel module can choose a suitable slot in this reserved range for the guest. However, I believe this is inconvenient and overly complex for the administrator. We are still in discussion since the snapshot and migration were not initially intended for our internal use. We are seeking a better design for it and also need advice from the users.

As for the testing with perf, I'll get to it as soon as the remaining work on the live migration data plane is finished ^^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working live migration Live migration
Projects
None yet
Development

No branches or pull requests

2 participants