Skip to content

Signal SIGILL (Illegal instruction) code ILL_ILLOPN (Illegal operand) after migrate to .net9 #112897

@Typhon226

Description

@Typhon226

Description

We are currently migrating to .net9 and after some the our application crashes without any exception.
At first i found this message in the journal of our ubuntu 22.04:
kernel: traps: .NET Server GC[1867769] trap invalid opcode ip:7f2eb7bdc9d1 sp:7f2eb0de1e90 error:0 in libcoreclr.so[7f2eb7784000+4ed000]

Then, after some digging, i was able to generate a dump with the binary of this bug
I used WinDbg to open it and saw the following error:
Signal SIGILL (Illegal instruction) code ILL_ILLOPN (Illegal operand) at 0x7fcae8ee19d1
Locking at the address is saw this:
Image

The crash did not happen at the same timings. It's arround 20-40 seconds until this happens.

The application is running on a small kubenetes system with only one node.
If i host everything on my local machine everything works fine.

Reproduction Steps

Sadly i don't now how to provide a reproduction step without giving access to the server.
I can provide a dump if needed.

Expected behavior

No crash

Actual behavior

As written in the description.
If needed i can provide a dump (~730MB uncompressed, ~65MB compressed)

Regression?

No response

Known Workarounds

Going back to .net8

Configuration

dotnet info of the kubernetes pod:
.NET SDK:
 Version:           9.0.200
 Commit:            90e8b202f2
 Workload version:  9.0.200-manifests.b4a8049f
 MSBuild version:   17.13.8+cbc39bea8

Runtime Environment:
 OS Name:     ubuntu
 OS Version:  22.04
 OS Platform: Linux
 RID:         linux-x64
 Base Path:   /usr/share/dotnet/sdk/9.0.200/

.NET workloads installed:
There are no installed workloads to display.
Configured to use loose manifests when installing new manifests.

Host:
  Version:      9.0.2
  Architecture: x64
  Commit:       80aa709f5d

.NET SDKs installed:
  9.0.200 [/usr/share/dotnet/sdk]

.NET runtimes installed:
  Microsoft.AspNetCore.App 9.0.2 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 9.0.2 [/usr/share/dotnet/shared/Microsoft.NETCore.App]

Other architectures found:
  None

Environment variables:
  Not set

global.json file:
  Not found
Host CPU info:
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 79
model name      : Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
stepping        : 1
microcode       : 0xb000036
cpu MHz         : 2199.998
cache size      : 16384 KB
physical id     : 0
siblings        : 6
core id         : 0
cpu cores       : 6
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 20
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat umip md_clear arch_capabilities
vmx flags       : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid shadow_vmcs pml
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa mmio_stale_data bhi
bogomips        : 4399.99
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:
CPU info of the Pod:
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 79
model name      : Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
stepping        : 1
microcode       : 0xb000036
cpu MHz         : 2199.998
cache size      : 16384 KB
physical id     : 0
siblings        : 6
core id         : 0
cpu cores       : 6
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 20
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat umip md_clear arch_capabilities
vmx flags       : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid shadow_vmcs pml
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa mmio_stale_data bhi
bogomips        : 4399.99
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

Other information

No response

Metadata

Metadata

Assignees

Labels

area-GC-coreclravx512Related to the AVX-512 architecturein-prThere is an active PR which will close this issue when it is mergedneeds-author-actionAn issue or pull request that requires more info or actions from the author.

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions