Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ffi/extension_methods_test went from Pass -> Crash on vm-win-release-ia32 #59779

Closed
derekxu16 opened this issue Dec 20, 2024 · 18 comments
Closed
Labels
area-native-interop Used for native interop related issues, including FFI. area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. crash Process exits with SIGSEGV, SIGABRT, etc. An unhandled exception is not a crash. gardening library-ffi os-windows triaged Issue has been triaged by sub team

Comments

@derekxu16
Copy link
Member

/=================================================================\
| ffi/extension_methods_test broke (Pass -> Crash, expected Pass) |
\=================================================================/

--- Command "vm" (took 02.000576s):
set DART_CONFIGURATION=ReleaseIA32 & set DART_SUPPRESS_WER=1 & set DART_CRASHPAD_HANDLER=C:\b\s\w\ir\out\ReleaseIA32\crashpad_handler.exe & out\ReleaseIA32\dart.exe --sound-null-safety -Dtest_runner.configuration=vm-win-release-ia32 --ignore-unrecognized-flags --packages=C:\b\s\w\ir\.dart_tool\package_config.json C:\b\s\w\ir\tests\ffi\extension_methods_test.dart

exit code:
-1073741819

stdout:
done

--- Re-run this test:
python3 tools/test.py -n vm-win-release-ia32 ffi/extension_methods_test

https://ci.chromium.org/ui/p/dart/builders/ci.sandbox/vm-win-release-ia32/602/overview

@dcharkes

@derekxu16 derekxu16 added area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. gardening triaged Issue has been triaged by sub team labels Dec 20, 2024
@halildurmus
Copy link
Contributor

halildurmus commented Dec 20, 2024

@dcharkes I've looked into this, and it seems to be related to the use of NativeFinalizers.

I can also reproduce the crash with this code:

void testCompoundRefWithFinalizer() {
  final ptr = calloc<Uint8>();
  final typedData = ptr.asTypedList(1, finalizer: calloc.nativeFree);
  // final vec4 = Struct.create<Vec4>();
  // vec4
  //   ..x = 1.2
  //   ..y = 3.4
  //   ..z = 5.6
  //   ..w = 7.8;
  // final result = twiddleVec4Components(vec4);
  // Expect.equals(3.4, result.x);
  // Expect.equals(5.6, result.y);
  // Expect.equals(7.8, result.z);
  // Expect.equals(1.2, result.w);
}

Here's the output:

--- Command "vm" (took 02.000479s):
set DART_CONFIGURATION=ReleaseIA32 & set DART_SUPPRESS_WER=1 & out\ReleaseIA32\dart.exe --sound-null-safety -Dtest_runner.configuration=vm-win-release-ia32 --ignore-unrecognized-flags --packages=C:\src\dart-sdk\sdk\.dart_tool\package_config.json C:\src\dart-sdk\sdk\tests\ffi\extension_methods_test.dart

exit code:
-1073740791

stdout:
done

stderr:
===== CRASH =====
ExceptionCode=-1073741819, ExceptionFlags=0, ExceptionAddress=057F6EE0
version=3.7.0-edge.4a52cbc5decfc9fced130cf0b9f1449aa31ed553 (main) (Fri Dec 20 03:14:14 2024 -0800) on "windows_ia32"
pid=11072, thread=9140, isolate_group=main(034EAF10), isolate=main(05776680)
os=windows, arch=ia32, comp=no, sim=no
isolate_instructions=918f40, vm_instructions=918f60
fp=b3f7ed, sp=2fded74, pc=57f6ee0
Stack dump aborted because GetAndValidateThreadStackBounds failed.
  pc 0x057f6ee0 fp 0x00b3f7ed Unknown symbol

@dcharkes
Copy link
Contributor

dcharkes commented Dec 20, 2024

I can also reproduce the crash with this code:

Does it reliably reproduce, or flakily?

It seems like this is an existing bug, but it's suspicious it started occurring from your CL. Your CL shouldn't have changed how existing code is executed.

The best way to investigate this is to build a Debug build, repro it there and run dart.exe in visual studio so that the debugger pops up on the crash.

Maybe we mark this as somewhat low prio, we'd be fixing it for Dart 3.7, but we're removing IA32 in Dart 3.8 (#59698). And I believe the 3.7 branch cut might be soon already. It doesn't ia32 specific, it just happens to trigger on it.

@dcharkes dcharkes added crash Process exits with SIGSEGV, SIGABRT, etc. An unhandled exception is not a crash. library-ffi area-native-interop Used for native interop related issues, including FFI. labels Dec 20, 2024
@halildurmus
Copy link
Contributor

Does it reliably reproduce, or flakily?

Yes, it reproduces reliably every time I run the test.

The best way to investigate this is to build a Debug build, repro it there and run dart.exe in visual studio so that the debugger pops up on the crash.

I'll try to do this, thanks.

@halildurmus
Copy link
Contributor

halildurmus commented Dec 20, 2024

@dcharkes I've managed to capture the exception. Here's the screenshot:

image

Please let me know if you need more information.

I've used the following code to reproduce the crash:

import 'dart:ffi';

import 'package:ffi/ffi.dart';

void main() {
  final p = calloc<Uint8>();
  final typedData = p.asTypedList(1, finalizer: calloc.nativeFree);
}

@dcharkes
Copy link
Contributor

Please let me know if you need more information.

The stack 🙏

@halildurmus
Copy link
Contributor

image

I should also mention the original program I've used to debug was:

import 'dart:ffi';
import 'dart:io';

import 'package:ffi/ffi.dart';

void main() async {
  sleep(Duration(seconds: 30));
  print('start');
  final p = calloc<Uint8>();
  final typedData = p.asTypedList(1, finalizer: calloc.nativeFree);
  print('done');
}

I've used sleep to give myself some time to attach the process to the debugger 😄

@dcharkes
Copy link
Contributor

An access violation while trying to unlock a mutex, that sounds like the memory of the mutex itself has been released already. And it's doing so while trying to process a message in the message handler.

If a mutex memory is no longer accessible, then something owning that mutex must have been freed, which is then the ThreadPool. Which could be the MutatorThreadPool of the isolate. You could pass --trace-isolates to dart.exe to see if the isolate has already been shutdown.

(Why that behavior would be different on ia32 compared to other architectures though?)

@halildurmus
Copy link
Contributor

You could pass --trace-isolates to dart.exe to see if the isolate has already been shutdown.

See logs
C:\src\dart-sdk\sdk\out\DebugIA32\dart-sdk\bin>dart --trace-isolates run bug\bin\bug.dart
[+] Opening port:
      handler:    vm-isolate
      port:       690973867323359
Size of vm isolate snapshot = 12898
New space (0k of 0k) Old space (125k of 1024k)
VM Isolate: Number of symbols : 897
VM Isolate: Symbol table capacity : 2048
[+] Opening port:
      handler:    vm-service
      port:       794159082638247
[+] Starting isolate:
      isolate:    vm-service
Size of isolate snapshot = 5149351
[+] Opening port:
      handler:    dartdev
      port:       4650200257518951
[+] Starting isolate:
      isolate:    dartdev
Size of isolate snapshot = 5149351
New space (0k of 0k) Old space (2381k of 2948k)
New space (0k of 0k) Old space (2381k of 2948k)
[+] Opening port:
      handler:    vm-service
      port:       629418774736511
[+] Starting message handler:
      handler:    vm-service
[+] Opening port:
      handler:    dartdev
      port:       6921948312434823
[+] Opening port:
      handler:    dartdev
      port:       726993444666075
[>] Posting message:
      len:        0
      source:     (4650200257518951) dartdev
      dest:       dartdev
      dest_port:  726993444666075
[+] Starting message handler:
      handler:    dartdev
[<] Handling message:
      len:        0
      handler:    dartdev
      port:       726993444666075
[-] Closing port:
      handler:    dartdev
      port:       726993444666075
[>] Posting message:
      len:        0
      source:     <native code>
      dest:       dartdev
      dest_port:  4650200257518951
[>] Posting message:
      len:        0
      source:     <native code>
      dest:       dartdev
      dest_port:  4650200257518951
[<] Handling message:
      len:        0
      handler:    dartdev
      port:       4650200257518951
[.] Message handled (OK):
      len:        0
      handler:    dartdev
      port:       4650200257518951
[<] Handling message:
      len:        0
      handler:    dartdev
      port:       4650200257518951
[.] Message handled (OK):
      len:        0
      handler:    dartdev
      port:       4650200257518951
[+] Opening port:
      handler:    dartdev
      port:       2809915921229563
[+] Opening port:
      handler:    IOService
      port:       8082918280705011
[>] Posting message:
      len:        17
      source:     <native code>
      dest:       dartdev
      dest_port:  2809915921229563
[.] Message handled (OK):
      len:        0
      handler:    dartdev
      port:       726993444666075
[<] Handling message:
      len:        17
      handler:    dartdev
      port:       2809915921229563
[-] Closing port:
      handler:    dartdev
      port:       2809915921229563
[.] Message handled (OK):
      len:        17
      handler:    dartdev
      port:       2809915921229563
[-] Stopping message handler (OK):
      handler:    dartdev
[>] Posting message:
      len:        59
      source:     (4650200257518951) dartdev
      dest:       vm-service
      dest_port:  629418774736511
[-] Closing all ports:
      handler:    dartdev
New space (164k of 1024k) Old space (16632k of 20468k)
[-] Stopping isolate:
      isolate:    dartdev
[<] Handling message:
      len:        59
      handler:    vm-service
      port:       629418774736511
[+] Opening port:
      handler:    main
      port:       2560850514164235
[+] Starting isolate:
      isolate:    main
Size of isolate snapshot = 5149351
[.] Message handled (OK):
      len:        59
      handler:    vm-service
      port:       629418774736511
New space (0k of 0k) Old space (2381k of 2948k)
[>] Posting message:
      len:        56
      source:     (2560850514164235) main
      dest:       vm-service
      dest_port:  629418774736511
[<] Handling message:
      len:        56
      handler:    vm-service
      port:       629418774736511
[.] Message handled (OK):
      len:        56
      handler:    vm-service
      port:       629418774736511
[+] Opening port:
      handler:    kernel-service
      port:       7186040393651111
[+] Starting isolate:
      isolate:    kernel-service
[>] Posting message:
      len:        66
      source:     (7186040393651111) kernel-service
      dest:       vm-service
      dest_port:  629418774736511
[<] Handling message:
      len:        66
      handler:    vm-service
      port:       629418774736511
[.] Message handled (OK):
      len:        66
      handler:    vm-service
      port:       629418774736511
[+] Opening port:
      handler:    kernel-service
      port:       4465367028955951
[+] Starting message handler:
      handler:    kernel-service
[+] Opening port:
      handler:    kernel-compilation-port
      port:       692974160615759
[>] Posting message:
      len:        8032616
      source:     (2560850514164235) main
      dest:       kernel-service
      dest_port:  4465367028955951
[<] Handling message:
      len:        8032616
      handler:    kernel-service
      port:       4465367028955951
[+] Opening port:
      handler:    kernel-service
      port:       2575176283582519
[>] Posting message:
      len:        0
      source:     (7186040393651111) kernel-service
      dest:       kernel-service
      dest_port:  2575176283582519
[.] Message handled (OK):
      len:        8032616
      handler:    kernel-service
      port:       4465367028955951
[<] Handling message:
      len:        0
      handler:    kernel-service
      port:       2575176283582519
[>] Posting message:
      len:        0
      source:     <native code>
      dest:       kernel-service
      dest_port:  7186040393651111
[>] Posting message:
      len:        0
      source:     <native code>
      dest:       kernel-service
      dest_port:  7186040393651111
[<] Handling message:
      len:        0
      handler:    kernel-service
      port:       7186040393651111
[.] Message handled (OK):
      len:        0
      handler:    kernel-service
      port:       7186040393651111
[<] Handling message:
      len:        0
      handler:    kernel-service
      port:       7186040393651111
[.] Message handled (OK):
      len:        0
      handler:    kernel-service
      port:       7186040393651111
[>] Posting message:
      len:        0
      source:     <native code>
      dest:       kernel-service
      dest_port:  7186040393651111
[>] Posting message:
      len:        0
      source:     <native code>
      dest:       kernel-service
      dest_port:  7186040393651111
[<] Handling message:
      len:        0
      handler:    kernel-service
      port:       7186040393651111
[.] Message handled (OK):
      len:        0
      handler:    kernel-service
      port:       7186040393651111
[<] Handling message:
      len:        0
      handler:    kernel-service
      port:       7186040393651111
[.] Message handled (OK):
      len:        0
      handler:    kernel-service
      port:       7186040393651111
[>] Posting message:
      len:        0
      source:     <native code>
      dest:       kernel-service
      dest_port:  7186040393651111
[>] Posting message:
      len:        0
      source:     <native code>
      dest:       kernel-service
      dest_port:  7186040393651111
[<] Handling message:
      len:        0
      handler:    kernel-service
      port:       7186040393651111
[.] Message handled (OK):
      len:        0
      handler:    kernel-service
      port:       7186040393651111
[<] Handling message:
      len:        0
      handler:    kernel-service
      port:       7186040393651111
[.] Message handled (OK):
      len:        0
      handler:    kernel-service
      port:       7186040393651111
[>] Posting message:
      len:        0
      source:     <native code>
      dest:       kernel-service
      dest_port:  7186040393651111
[>] Posting message:
      len:        0
      source:     <native code>
      dest:       kernel-service
      dest_port:  7186040393651111
[<] Handling message:
      len:        0
      handler:    kernel-service
      port:       7186040393651111
[.] Message handled (OK):
      len:        0
      handler:    kernel-service
      port:       7186040393651111
[<] Handling message:
      len:        0
      handler:    kernel-service
      port:       7186040393651111
[.] Message handled (OK):
      len:        0
      handler:    kernel-service
      port:       7186040393651111
[>] Posting message:
      len:        0
      source:     <native code>
      dest:       kernel-service
      dest_port:  7186040393651111
[<] Handling message:
      len:        0
      handler:    kernel-service
      port:       7186040393651111
[>] Posting message:
      len:        0
      source:     <native code>
      dest:       kernel-service
      dest_port:  7186040393651111
[.] Message handled (OK):
      len:        0
      handler:    kernel-service
      port:       7186040393651111
[<] Handling message:
      len:        0
      handler:    kernel-service
      port:       7186040393651111
[.] Message handled (OK):
      len:        0
      handler:    kernel-service
      port:       7186040393651111
[.] Message handled (OK):
      len:        0
      handler:    kernel-service
      port:       2575176283582519
[+] Opening port:
      handler:    main
      port:       7079093806979759
[>] Posting message:
      len:        0
      source:     (2560850514164235) main
      dest:       main
      dest_port:  7079093806979759
[+] Starting message handler:
      handler:    main
[<] Handling message:
      len:        0
      handler:    main
      port:       7079093806979759
[-] Closing port:
      handler:    main
      port:       7079093806979759
done
[.] Message handled (OK):
      len:        0
      handler:    main
      port:       7079093806979759
[-] Stopping message handler (OK):
      handler:    main
[>] Posting message:
      len:        56
      source:     (2560850514164235) main
      dest:       vm-service
      dest_port:  629418774736511
[<] Handling message:
      len:        56
      handler:    vm-service
      port:       629418774736511

===== CRASH =====
ExceptionCode=-1073741819, ExceptionFlags=0, ExceptionAddress=095D3450
[.] Message handled (OK):
      len:        56
      handler:    vm-service
      port:       629418774736511

@dcharkes
Copy link
Contributor

Okay, so the main isolate message handler has already been shut down, but the vm-service message handler has not been shut down. The vm-service message is also successfully being handled, so that probably happens on a different thread than the crash. And the crash happens in posting the message to the message handler.

So, I'm pretty convinced something is trying to post a message to the main message handler after it has been freed. It could be that the isolate shutdown is posting a message to its own isolate because a GC is still happening. Finalizers being triggered in the GC usually posts a message to the isolate to cleanup some data structure on the Dart heap. Could you try passing --trace-shutdown in addition to --trace-isolates. And maybe also --trace-finalizers.

You're seeing all corners of the VM now. 😄

@halildurmus
Copy link
Contributor

Could you try passing --trace-shutdown in addition to --trace-isolates. And maybe also --trace-finalizers.

See logs
C:\src\dart-sdk\sdk\out\DebugIA32\dart-sdk\bin>dart --trace-isolates --trace-shutdown --trace-finalizers run bug\bin\bug.dart
[+] Opening port:
        handler:    vm-isolate
        port:       1918744341324119
Size of vm isolate snapshot = 12898
New space (0k of 0k) Old space (125k of 1024k)
VM Isolate: Number of symbols : 897
VM Isolate: Symbol table capacity : 2048
[+] Opening port:
        handler:    vm-service
        port:       439813594960907
[+] Starting isolate:
        isolate:    vm-service
Size of isolate snapshot = 5149351
[+] Opening port:
        handler:    dartdev
        port:       8986765789727259
[+] Starting isolate:
        isolate:    dartdev
Size of isolate snapshot = 5149351
New space (0k of 0k) Old space (2381k of 2948k)
New space (0k of 0k) Old space (2381k of 2948k)
[+] Opening port:
        handler:    vm-service
        port:       2067650076764851
[+] Starting message handler:
        handler:    vm-service
[+] Opening port:
        handler:    dartdev
        port:       4910143243427547
[+] Opening port:
        handler:    dartdev
        port:       7612085297583539
[>] Posting message:
        len:        0
        source:     (8986765789727259) dartdev
        dest:       dartdev
        dest_port:  7612085297583539
[+] Starting message handler:
        handler:    dartdev
[<] Handling message:
        len:        0
        handler:    dartdev
        port:       7612085297583539
[-] Closing port:
        handler:    dartdev
        port:       7612085297583539
[>] Posting message:
        len:        0
        source:     <native code>
        dest:       dartdev
        dest_port:  8986765789727259
[>] Posting message:
        len:        0
        source:     <native code>
        dest:       dartdev
        dest_port:  8986765789727259
[<] Handling message:
        len:        0
        handler:    dartdev
        port:       8986765789727259
[.] Message handled (OK):
        len:        0
        handler:    dartdev
        port:       8986765789727259
[<] Handling message:
        len:        0
        handler:    dartdev
        port:       8986765789727259
[.] Message handled (OK):
        len:        0
        handler:    dartdev
        port:       8986765789727259
[+] Opening port:
        handler:    dartdev
        port:       1150815946996399
[+] Opening port:
        handler:    IOService
        port:       1545131101820199
[>] Posting message:
        len:        17
        source:     <native code>
        dest:       dartdev
        dest_port:  1150815946996399
[.] Message handled (OK):
        len:        0
        handler:    dartdev
        port:       7612085297583539
[<] Handling message:
        len:        17
        handler:    dartdev
        port:       1150815946996399
[-] Closing port:
        handler:    dartdev
        port:       1150815946996399
[.] Message handled (OK):
        len:        17
        handler:    dartdev
        port:       1150815946996399
[-] Stopping message handler (OK):
        handler:    dartdev
[>] Posting message:
        len:        59
        source:     (8986765789727259) dartdev
        dest:       vm-service
        dest_port:  2067650076764851
[<] Handling message:
        len:        59
        handler:    vm-service
        port:       2067650076764851
[-] Closing all ports:
        handler:    dartdev
New space (164k of 1024k) Old space (16632k of 20468k)
[-] Stopping isolate:
        isolate:    dartdev
[+878ms] SHUTDOWN: Shutdown starting for group dartdev
[+885ms] SHUTDOWN: Notifying isolate group shutdown (dartdev)
[+885ms] SHUTDOWN: Done Notifying isolate group shutdown (dartdev)
[+885ms] SHUTDOWN: Done shutdown for group dartdev
[.] Message handled (OK):
        len:        59
        handler:    vm-service
        port:       2067650076764851
[+] Opening port:
        handler:    main
        port:       4084362195583655
[+] Starting isolate:
        isolate:    main
Size of isolate snapshot = 5149351
New space (0k of 0k) Old space (2381k of 2948k)
[>] Posting message:
        len:        56
        source:     (4084362195583655) main
        dest:       vm-service
        dest_port:  2067650076764851
[<] Handling message:
        len:        56
        handler:    vm-service
        port:       2067650076764851
[.] Message handled (OK):
        len:        56
        handler:    vm-service
        port:       2067650076764851
[+] Opening port:
        handler:    kernel-service
        port:       4456700769857399
[+] Starting isolate:
        isolate:    kernel-service
[>] Posting message:
        len:        66
        source:     (4456700769857399) kernel-service
        dest:       vm-service
        dest_port:  2067650076764851
[<] Handling message:
        len:        66
        handler:    vm-service
        port:       2067650076764851
[.] Message handled (OK):
        len:        66
        handler:    vm-service
        port:       2067650076764851
[+] Opening port:
        handler:    kernel-service
        port:       40826993182675
[+] Starting message handler:
        handler:    kernel-service
[+] Opening port:
        handler:    kernel-compilation-port
        port:       1955371217559375
[>] Posting message:
        len:        8032616
        source:     (4084362195583655) main
        dest:       kernel-service
        dest_port:  40826993182675
[<] Handling message:
        len:        8032616
        handler:    kernel-service
        port:       40826993182675
[+] Opening port:
        handler:    kernel-service
        port:       4670741111683503
[>] Posting message:
        len:        0
        source:     (4456700769857399) kernel-service
        dest:       kernel-service
        dest_port:  4670741111683503
[.] Message handled (OK):
        len:        8032616
        handler:    kernel-service
        port:       40826993182675
[<] Handling message:
        len:        0
        handler:    kernel-service
        port:       4670741111683503
[>] Posting message:
        len:        0
        source:     <native code>
        dest:       kernel-service
        dest_port:  4456700769857399
[<] Handling message:
        len:        0
        handler:    kernel-service
        port:       4456700769857399
[>] Posting message:
        len:        0
        source:     <native code>
        dest:       kernel-service
        dest_port:  4456700769857399
[.] Message handled (OK):
        len:        0
        handler:    kernel-service
        port:       4456700769857399
[<] Handling message:
        len:        0
        handler:    kernel-service
        port:       4456700769857399
[.] Message handled (OK):
        len:        0
        handler:    kernel-service
        port:       4456700769857399
[>] Posting message:
        len:        0
        source:     <native code>
        dest:       kernel-service
        dest_port:  4456700769857399
[>] Posting message:
        len:        0
        source:     <native code>
        dest:       kernel-service
        dest_port:  4456700769857399
[<] Handling message:
        len:        0
        handler:    kernel-service
        port:       4456700769857399
[.] Message handled (OK):
        len:        0
        handler:    kernel-service
        port:       4456700769857399
[<] Handling message:
        len:        0
        handler:    kernel-service
        port:       4456700769857399
[.] Message handled (OK):
        len:        0
        handler:    kernel-service
        port:       4456700769857399
[>] Posting message:
        len:        0
        source:     <native code>
        dest:       kernel-service
        dest_port:  4456700769857399
[>] Posting message:
        len:        0
        source:     <native code>
        dest:       kernel-service
        dest_port:  4456700769857399
[<] Handling message:
        len:        0
        handler:    kernel-service
        port:       4456700769857399
[.] Message handled (OK):
        len:        0
        handler:    kernel-service
        port:       4456700769857399
[<] Handling message:
        len:        0
        handler:    kernel-service
        port:       4456700769857399
[.] Message handled (OK):
        len:        0
        handler:    kernel-service
        port:       4456700769857399
[>] Posting message:
        len:        0
        source:     <native code>
        dest:       kernel-service
        dest_port:  4456700769857399
[>] Posting message:
        len:        0
        source:     <native code>
        dest:       kernel-service
        dest_port:  4456700769857399
[<] Handling message:
        len:        0
        handler:    kernel-service
        port:       4456700769857399
[.] Message handled (OK):
        len:        0
        handler:    kernel-service
        port:       4456700769857399
[<] Handling message:
        len:        0
        handler:    kernel-service
        port:       4456700769857399
[.] Message handled (OK):
        len:        0
        handler:    kernel-service
        port:       4456700769857399
[>] Posting message:
        len:        0
        source:     <native code>
        dest:       kernel-service
        dest_port:  4456700769857399
[>] Posting message:
        len:        0
        source:     <native code>
        dest:       kernel-service
        dest_port:  4456700769857399
[<] Handling message:
        len:        0
        handler:    kernel-service
        port:       4456700769857399
[.] Message handled (OK):
        len:        0
        handler:    kernel-service
        port:       4456700769857399
[<] Handling message:
        len:        0
        handler:    kernel-service
        port:       4456700769857399
[.] Message handled (OK):
        len:        0
        handler:    kernel-service
        port:       4456700769857399
[.] Message handled (OK):
        len:        0
        handler:    kernel-service
        port:       4670741111683503
[+] Opening port:
        handler:    main
        port:       5323504413491539
[>] Posting message:
        len:        0
        source:     (4084362195583655) main
        dest:       main
        dest_port:  5323504413491539
[+] Starting message handler:
        handler:    main
[<] Handling message:
        len:        0
        handler:    main
        port:       5323504413491539
[-] Closing port:
        handler:    main
        port:       5323504413491539
Setting external size from  0 to  1 bytes in new space
done
[.] Message handled (OK):
        len:        0
        handler:    main
        port:       5323504413491539
[-] Stopping message handler (OK):
        handler:    main
[>] Posting message:
        len:        56
        source:     (4084362195583655) main
        dest:       vm-service
        dest_port:  2067650076764851
Isolate 0CCEE028 Setting finalizer 0C59A05C isolate to null
Isolate shutdown: Running native finalizer 0C59A05C callback 014284C0 with token 0C1B2F00
[<] Handling message:
        len:        56
        handler:    vm-service
        port:       2067650076764851

===== CRASH =====
ExceptionCode=-1073741819, ExceptionFlags=0, ExceptionAddress=0C1B2F00
[.] Message handled (OK):
        len:        56
        handler:    vm-service
        port:       2067650076764851

You're seeing all corners of the VM now. 😄

😆

@halildurmus
Copy link
Contributor

halildurmus commented Dec 20, 2024

I run the program again but this time also attached the VS debugger and got: Exception thrown at 0x0B9AA818 in dart.exe: 0xC0000005: Access violation executing location 0x0B9AA818.

Isolate 093557F0 Setting finalizer 0C51A454 isolate to null
Isolate shutdown: Running native finalizer 0C51A454 callback 014284C0 with token 0B9AA818

From the above logs, the 0x0B9AA818 address seems to belong to the finalizer token.

@dcharkes
Copy link
Contributor

I also attached the VS debugger and got: Exception thrown at 0x0B9AA818 in dart.exe: 0xC0000005: Access violation executing location 0x0B9AA818.

Isolate 093557F0 Setting finalizer 0C51A454 isolate to null
Isolate shutdown: Running native finalizer 0C51A454 callback 014284C0 with token 0B9AA818

From the above logs, the 0x0B9AA818 address seems to belong to the finalizer token.

Nice spotting!

That could mean that nativeFree is no longer executable. We could verify this by:

    if (FLAG_trace_finalizers) {
      THR_Print(
          "%s: Running native finalizer %p callback %p "
          "with token %p\n",
          trace_context, ptr()->untag(), callback, peer);
    }
    entry.set_token(entry);
    callback(peer);
    // Add if(trace) Print( ... to verify that we don't reach this.

And then we can triage why it's not executable.

It should be

https://github.com/dart-lang/native/blob/23a42e032d9cbe2347cd47546b14839e2b01bad6/pkgs/ffi/lib/src/allocation.dart#L37-L38

final Pointer<NativeFunction<WinCoTaskMemFreeNative>> winCoTaskMemFreePointer =
    ole32lib.lookup('CoTaskMemFree');

Hm, but it should be that the "token" is the parameter passed to the free function. So that should simply be p.address printed in hexadecimal. And winCoTaskMemFreePointer.address should be be the "callback". But the error message is saying we're trying to execute the token address? Can you print the addresses and see if they match?

@halildurmus
Copy link
Contributor

halildurmus commented Dec 20, 2024

That could mean that nativeFree is no longer executable. We could verify this by:

Yes, it crashes before reaching the print I added.

The weird thing is the addresses of the pointers doesn't match the addresses in the logs:
Exception thrown at 0x0CD554D8 in dart.exe: 0xC0000005: Access violation executing location 0x0CD554D8.

p: Pointer: address=0x5abb198
finalizer: Pointer: address=0x77309b50
Setting external size from  0 to  1 bytes in new space
done
Isolate 091A26F8 Setting finalizer 0D39A724 isolate to null
Isolate shutdown: Running native finalizer 0D39A724 callback 012A84C0 with token 091A6AF0

This was the code I used:

import 'dart:ffi';
import 'dart:io';

import 'package:ffi/ffi.dart';

final finalizer = calloc.nativeFree;

void main() {
  // sleep(Duration(seconds: 10));
  final p = calloc<Uint8>();
  print('p: $p');
  print('finalizer: $finalizer');
  final typedData = p.asTypedList(1, finalizer: finalizer);
  print('done');
}

Another thing I've tried is that instead of using asTypedList, I created a class and attached the pointer to the finalizer. This time the addresses matched and the print I added was reached but it still crashed at the Mutex::Unlock with an access violation:

p: Pointer: address=0x9cbfa78
Setting external size from  0 to  1 bytes in new space
finalizer: Pointer: address=0x77309b50
done
Isolate 09B73360 Setting finalizer 0CC1A124 isolate to null
Isolate shutdown: Running native finalizer 0CC1A124 callback 77309B50 with token 09CBFA78
Finalizer callback finished

This was the code I used:

import 'dart:ffi';
import 'dart:io';

import 'package:ffi/ffi.dart';

final finalizer = calloc.nativeFree;

class C implements Finalizable {
  C(this.p) {
    _finalizer.attach(this, p.cast());
  }

  static final _finalizer = NativeFinalizer(finalizer);

  final Pointer<Uint8> p;
}

void main() {
  // sleep(Duration(seconds: 10));
  final p = calloc<Uint8>();
  print('p: $p');
  final c = C(p);
  print('finalizer: $finalizer');
  // final typedData = p.asTypedList(1, finalizer: finalizer);
  print('done');
}

Next, I tried setting the externalSize parameter of the _finalizer.attach method to 1 and this time I got a totally different exception:

image

p: Pointer: address=0xa099a20
Setting external size from  0 to  1 bytes in new space
finalizer: Pointer: address=0x77309b50
done
Isolate 0A1FEAB8 Setting finalizer 0CF1A124 isolate to null
Isolate shutdown: Running native finalizer 0CF1A124 callback 77309B50 with token 0A099A20
Finalizer callback finished
Isolate shutdown: Clearing external size 1 bytes in new space

I'm so confused. 😆

Edit: I can also reproduce this in 9f82b8595dce2d269cf2fe32f313360d53a51192 which was the last commit before my CL merged.

@dcharkes
Copy link
Contributor

The weird thing is the addresses of the pointers doesn't match the addresses in the logs:

Another thing I've tried is that instead of using asTypedList, I created a class and attached the pointer to the finalizer. This time the addresses matched

Ah, it might be that asTypedList under the hood where uses a struct to store both the callback address and the token address and then calls a wrapper function that loads those two addresses, frees that struct and and then calls the actual callback. That's why the addresses could be different. See runtime/lib/ffi.cc AsTypedListFinalizerCallback.

If the error occurs with NativeFinalizers and the addresses match, then let's focus on that simpler case first. 👍

Finalizer callback finished

Interesting, so it's not the finalizer callback that's crashing.

Next, I tried setting the externalSize parameter of the _finalizer.attach method to 1 and this time I got a totally different exception:

But a fully symbolized stack trace! ❤️

It's is the same type of error, an access violation. So we're trying to access something that already has been freed. It's not immediately obvious what we're accessing that's free. The Isolate is still alive, it's on the call-stack. So is the IsolateGroup its in. This means that the NativeFinalizer::RunCallback must have still access to the group. The heap_ in the group is only freed in the IsolateGroup destructor, so that must still be alive as well. And the new_space_ in Heap must then be alive as well. So I'm not sure why we're getting an access violation.

Edit: I can also reproduce this in 9f82b8595dce2d269cf2fe32f313360d53a51192 which was the last commit before my CL merged.

You can try some very old commits to see how long it's been there. You can give tools/bisect.dart a spin if you find an old commit that doesn't repro it. It's used as follows:

dart tools/bisect.dart \
-Dend=842ce8264a376617e5c1d9837e700bb59c1895e5 \
-Dstart=25a309a85c449d1dce414096f1a6bc10d0bd5f7f \
-Dtest_command="python3 tools/build.py --sanitizer=tsan -ax64 -mrelease runtime runtime_precompiled" \
-Dtest_command="python3 tools/test.py -n vm-tsan-linux-release-x64 standalone/io/secure_session_resume_test/3" \
-Dfailure_string="data race"

(I don't think I've ever used bisect.dart on Windows before though, so you might hit some rough edges.)

P.S. I love your rigor! ❤️ Teamwork makes the dreamwork!

@halildurmus
Copy link
Contributor

halildurmus commented Dec 20, 2024

There's a test in tests/ffi/external_typed_data_finalizer_test.dart which uses posix calloc and free and this doesn't crash. I'm starting to suspect the problem is maybe with the Windows APIs we use in package:ffi? But why would that only fail with IA32?

void testAsTypedList() {
  const length = 10;
  final ptr = calloc(length, sizeOf<Int16>()).cast<Int16>();
  final typedList = ptr.asTypedList(length, finalizer: freePointer);
  print(typedList);
}

@Native<Pointer<Void> Function(IntPtr num, IntPtr size)>(isLeaf: true)
external Pointer<Void> calloc(int num, int size);

final freePointer = DynamicLibrary.process()
    .lookup<NativeFunction<Void Function(Pointer<Void>)>>('free');

@dcharkes
Copy link
Contributor

Edit: I can also reproduce this in 9f82b8595dce2d269cf2fe32f313360d53a51192 which was the last commit before my CL merged.

Actually, if the test was consistently succeeding on the bots before on all Dart versions, and now it's consistently failing. It could be that the Windows bots got updated yesterday (2024-12-19) with a newer Windows version (cc @athomas Did we update Windows bots yesterday?). And if it's failing for you locally, you probably have the same Windows version as the bots.

That means trying very old commits will probably reproduce the issue for you locally, and also on the bots if we revert.

There's a test tests/ffi/external_typed_data_finalizer_test.dart which uses posix calloc and free and this doesn't crash. I'm starting to suspect the problem is maybe with the Windows APIs we use in package:ffi?

That could be. That would be consistent with a Windows update changing the behavior of CoTaskMemFree on ia32. And it would be consistent with this error only showing up with the Windows memory API and not when changing to calloc and free.

(It's not consistent with getting access violations on two places in VM code when looking at the stack traces. Are the stack traces consistent on multiple runs in the debugger? And the two different VM crashes are both an access violation, so they do point to the same type of issue.)

But why would that only fail with IA32?

I am no Windows expert. I would refer to the experts doing package:win32, oh that's you! 😄

@halildurmus
Copy link
Contributor

halildurmus commented Dec 21, 2024

Ah, it might be that asTypedList under the hood where uses a struct to store both the callback address and the token address and then calls a wrapper function that loads those two addresses, frees that struct and and then calls the actual callback. That's why the addresses could be different. See runtime/lib/ffi.cc AsTypedListFinalizerCallback.

Exactly. I’ve confirmed that the addresses indeed match inside the AsTypedListFinalizerCallback function.

Interesting, so it's not the finalizer callback that's crashing.

It seems that way. I set some breakpoints and observed that the finalizer callbacks were executed without any issues. However, the crash always occurs right after the surrounding function completes execution -- in the case of .asTypedList, that’s the AsTypedListFinalizerCallback function; for the class with the finalizer, it’s NativeFinalizer::RunCallback. The crash consistently occurs in Mutex::Unlock with an access violation. However, if I specify the externalSize parameter, the crash instead happens in group->heap()->FreedExternal(external_size, space);, as I mentioned earlier.

It's is the same type of error, an access violation. So we're trying to access something that already has been freed. It's not immediately obvious what we're accessing that's free. The Isolate is still alive, it's on the call-stack. So is the IsolateGroup its in. This means that the NativeFinalizer::RunCallback must have still access to the group. The heap_ in the group is only freed in the IsolateGroup destructor, so that must still be alive as well. And the new_space_ in Heap must then be alive as well. So I'm not sure why we're getting an access violation.

It seems the heap_ has been corrupted somehow:

Screenshot 2024-12-22 224754

Actually, if the test was consistently succeeding on the bots before on all Dart versions, and now it's consistently failing. It could be that the Windows bots got updated yesterday (2024-12-19) with a newer Windows version (cc @athomas Did we update Windows bots yesterday?). And if it's failing for you locally, you probably have the same Windows version as the bots.

Just to clarify: the failing test is the one I added in my CL, which uses .refWithFinalizer (and sets calloc.nativeFree as the finalizer callback, which is CoTaskMemFree on Windows). Previously, there wasn’t any test that specifically used CoTaskMemFree as a finalizer callback. The existing test in tests/ffi/external_typed_data_finalizer_test.dart was using the POSIX free function as its finalizer callback. I believe my test uncovered an underlying issue with using CoTaskMemFree as a finalizer callback on Windows IA32.

For reference, I’m using Windows 11 Insider Preview Build 22635.4445, while the logs in the setup_build step suggest the Windows bots are using Windows-10-19045.2006.

There's a test tests/ffi/external_typed_data_finalizer_test.dart which uses posix calloc and free and this doesn't crash. I'm starting to suspect the problem is maybe with the Windows APIs we use in package:ffi?

That could be. That would be consistent with a Windows update changing the behavior of CoTaskMemFree on ia32. And it would be consistent with this error only showing up with the Windows memory API and not when changing to calloc and free.

Allocating and freeing pointers with CoTaskMemAlloc and CoTaskMemFree works fine outside of this context. The issue only arises when CoTaskMemFree is used as a finalizer callback. I couldn’t find any relevant information online about this behavior, which makes it all the more confusing.

(It's not consistent with getting access violations on two places in VM code when looking at the stack traces. Are the stack traces consistent on multiple runs in the debugger?

Yes, they are entirely consistent across runs.

I am no Windows expert. I would refer to the experts doing package:win32, oh that's you! 😄

Haha, I wouldn’t call myself a Windows expert, but I’ll take that as a compliment! 😄

As for my thoughts on this issue: since there haven’t been any reports from users experiencing this crash, and with IA32 support being phased out in Dart 3.8, it might not be worth investigating this further.

Thanks so much for looking into this with me, Daco. I’ve learned a lot along the way. 🙏

@dcharkes
Copy link
Contributor

It seems the heap_ has been corrupted somehow:

Yeah, that sounds about right. That would explain why we got two different access violations both on heap_.

As for my thoughts on this issue: since there haven’t been any reports from users experiencing this crash, and with IA32 support being phased out in Dart 3.8, it might not be worth investigating this further.

Agreed 👍

Thanks so much for looking into this with me, Daco. I’ve learned a lot along the way. 🙏

You're welcome! And thanks for your contributions (both in the SDK and package:win32)! ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-native-interop Used for native interop related issues, including FFI. area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. crash Process exits with SIGSEGV, SIGABRT, etc. An unhandled exception is not a crash. gardening library-ffi os-windows triaged Issue has been triaged by sub team
Projects
None yet
Development

No branches or pull requests

3 participants