pjrt_c_api_cpu_plugin - MacOS amd and arm64 compilation #19152

dmarro89 · 2024-11-07T16:07:22Z

Hi,
I'm trying to compile on macOS amd and arm64 the plugin pjrt_c_api_cpu_plugin.so.
The build succeeds but when I try to use it in gopjrt I'm getting a segmentation violation error.

What I've done:
On the xla project :

./configure.py --backend CPU --os DARWIN
bazel build //xla/pjrt/c:pjrt_c_api_cpu_plugin.so

Then I've just tried to run the gopjrt tests and they are showing me the error.

If I use the JAX metal plugin -> pjrt_plugin_metal_14.dylib the project is working fine.

Someone has already compiled and used the plugin on macOS?

Thanks

The text was updated successfully, but these errors were encountered:

janpfeifer · 2024-11-17T09:05:33Z

Same error occurs with bazel test //xla/pjrt:pjrt_c_api_client_test, it bumps on complex number MatMul not defined in Eigen.

janpfeifer · 2024-11-17T09:15:13Z

But aside from these broken tests issues, a pjrt_c_api_cpu_plugin.so can be built on Mac (both x86_64 and arm64). But compilation of very trivial HLO programs fail on Mac.

(XLA at commit 9ab7d704d7fe7e73fc3976adc2ccec070bc9a2ea)

To factor out the Go bindings (gopjrt project) that we were using to test, I created the following additional mac_test.cc file under xla/pjrt, that instead of trying to link along the PRJT with the test (which fails on the complex number MatMul described above), tries to reproduce what a PJRT client would do, which is to use dlopen on the plugin.

The C++ test compiles runs and fails without an error in Mac -- but succeeds as expected in Linux.

Here is the code:

Same includes as //xla/pjrt/c:pjrt_c_api_cpu_test, except removed the #include "xla/pjrt/c/pjrt_c_api_cpu_internal.h" -- to avoid linking in the pjrt and the missing complex MatMul.
Same BUILD dependencies as //xla/pjrt/c:pjrt_c_api_cpu_test, except "//xla/pjrt/c:pjrt_c_api_cpu_internal" which was commented out.

namespace xla {
namespace {

using std::cout;
using std::endl;

const std::string kPjrtPluginPath = "/usr/local/lib/gomlx/pjrt/pjrt_c_api_cpu_plugin.so";

static void SetUpCpuPjRtApi() {
  std::string device_type = "cpu";
  auto status = ::pjrt::PjrtApi(device_type);
  if (!status.ok()) {
    TF_ASSERT_OK_AND_ASSIGN(
      const PJRT_Api* api,
      pjrt::LoadPjrtPlugin(device_type, kPjrtPluginPath)
    );
    cout << "Loaded PJRT from " << kPjrtPluginPath << endl;
  }
}

TEST(PjRtCApiClientTest, EndToEnd) {
  SetUpCpuPjRtApi();
  TF_ASSERT_OK_AND_ASSIGN(std::unique_ptr<PjRtClient> client,
                          GetCApiClient("cpu"));
  cout << "\tplatform_name=" << client->platform_name()
    << ", platform_id=" << client->platform_id()
    << endl;

  // Create f(x) = x^2
  cout << "Create f(x) = x^2:" << endl;
  XlaBuilder builder("x*x+1");
  Shape x_shape = ShapeUtil::MakeShape(F32, {});
  auto x = Parameter(&builder, 0, x_shape, "x");
  auto f = Mul(x, x);
  auto computation = builder.Build(f).value();
  cout << "\tComputation built." << endl;
  std::unique_ptr<PjRtLoadedExecutable> executable =
      client->Compile(computation, CompileOptions()).value();
  cout << "\tCompiled to executable." << endl;

  // Transfer concrete x value.
  std::vector<float> data{3};
  TF_ASSERT_OK_AND_ASSIGN(
      auto x_value,
      client->BufferFromHostBuffer(
          data.data(), x_shape.element_type(), x_shape.dimensions(),
          /*byte_strides=*/std::nullopt,
          PjRtClient::HostBufferSemantics::kImmutableOnlyDuringCall, nullptr,
          client->addressable_devices()[0]));
  cout << "Tranferred value of x=" << data[0] << " to device." << endl;

  // Execute function.
  ExecuteOptions execute_options;
  execute_options.non_donatable_input_indices = {0};
  std::vector<std::vector<std::unique_ptr<PjRtBuffer>>> results =
      executable->Execute({{x_value.get()}}, execute_options)
          .value();
  ASSERT_EQ(results[0].size(), 1);
  auto& result_buffer = results[0][0];
  // How do we print the actual result !?
  cout << "Success" << endl;
}

}  // namespace
}  // namespace xla

When running with bazel test --test_output=streamed //xla/pjrt:mac_test, I get:

...
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from PjRtCApiClientTest
[ RUN      ] PjRtCApiClientTest.EndToEnd
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1731834725.730697  488487 pjrt_api.cc:99] GetPjrtApi was found for cpu at /usr/local/lib/gomlx/pjrt/pjrt_c_api_cpu_plugin.so
I0000 00:00:1731834725.730723  488487 pjrt_api.cc:78] PJRT_Api is set for device type cpu
Loaded PJRT from /usr/local/lib/gomlx/pjrt/pjrt_c_api_cpu_plugin.so
2024-11-17 09:12:05.731356: I xla/pjrt/pjrt_c_api_client.cc:127] PjRtCApiClient created.
        platform_name=cpu, platform_id=17910157233389077760
Create f(x) = x^2:
        Computation built.
2024-11-17 09:12:05.736506: I xla/service/llvm_ir/llvm_command_line_options.cc:50] XLA (re)initializing LLVM with options fingerprint: 13580721266388441979
FAIL: //xla/pjrt:mac_test (see /private/var/tmp/_bazel_m1/7c4fef9842af5ebc1d7b23e8ea13a23d/execroot/xla/bazel-out/darwin_arm64-opt/testlogs/xla/pjrt/mac_test/test.log)
Target //xla/pjrt:mac_test up-to-date:
  bazel-bin/xla/pjrt/mac_test
...

Yes, it fails without an error of any type before finishing to compile the HLO program.
The contents of the test.log are the same.

janpfeifer · 2024-11-17T09:42:31Z

I'm trying to debug it, and building it with -c dbg and running it, it gives more hints:

$ bazel build -c dbg //xla/pjrt:mac_test
... (build output)...
$ ./bazel-bin/xla/pjrt/mac_test
Running main() from gmock_main.cc
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from PjRtCApiClientTest
[ RUN      ] PjRtCApiClientTest.EndToEnd
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1731836397.519141  525455 pjrt_api.cc:99] GetPjrtApi was found for cpu at /usr/local/lib/gomlx/pjrt/pjrt_c_api_cpu_plugin.so
I0000 00:00:1731836397.519471  525455 pjrt_api.cc:78] PJRT_Api is set for device type cpu
Loaded PJRT from /usr/local/lib/gomlx/pjrt/pjrt_c_api_cpu_plugin.so
2024-11-17 10:39:57.523582: I xla/pjrt/pjrt_c_api_client.cc:127] PjRtCApiClient created.
        platform_name=cpu, platform_id=17910157233389077760
Create f(x) = x^2:
        Computation built.
2024-11-17 10:39:57.618260: I xla/service/llvm_ir/llvm_command_line_options.cc:50] XLA (re)initializing LLVM with options fingerprint: 4980911022492290414
mac_test(37825,0x1e9c9fac0) malloc: *** error for object 0x400000000: pointer being realloc'd was not allocated
mac_test(37825,0x1e9c9fac0) malloc: *** set a breakpoint in malloc_error_break to debug
Abort trap: 6

Again, notice this is something that only happens on DARWIN platform.

janpfeifer · 2024-11-17T17:35:07Z

Also if one changes xla/pjrt/pjrt_c_api_client_test.cc to load the plugin (as opposed to link it in), it fails in the same way on Mac. To do that just replace a couple of line in the function:

static void SetUpCpuPjRtApi() {
  std::string device_type = "cpu";
  auto status = ::pjrt::PjrtApi(device_type);
  if (!status.ok()) {
    TF_ASSERT_OK(
        pjrt::SetPjrtApi(device_type, ::pjrt::cpu_plugin::GetCpuPjrtApi()));
  }
}

With (adjust kPjrtPluginpath to the path of the PJRT plugin):

const std::string kPjrtPluginPath = "/usr/local/lib/gomlx/pjrt/pjrt_c_api_cpu_plugin.so";
static void SetUpCpuPjRtApi() {
  std::string device_type = "cpu";
  auto status = ::pjrt::PjrtApi(device_type);
  if (!status.ok()) {
    TF_ASSERT_OK_AND_ASSIGN(
      const PJRT_Api* api,
      pjrt::LoadPjrtPlugin(device_type, kPjrtPluginPath));
  }
}

janpfeifer mentioned this issue Nov 17, 2024

Updated proto definitions gomlx/gopjrt#14

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pjrt_c_api_cpu_plugin - MacOS amd and arm64 compilation #19152

pjrt_c_api_cpu_plugin - MacOS amd and arm64 compilation #19152

dmarro89 commented Nov 7, 2024 •

edited

Loading

janpfeifer commented Nov 17, 2024

janpfeifer commented Nov 17, 2024 •

edited

Loading

janpfeifer commented Nov 17, 2024

janpfeifer commented Nov 17, 2024

pjrt_c_api_cpu_plugin - MacOS amd and arm64 compilation #19152

pjrt_c_api_cpu_plugin - MacOS amd and arm64 compilation #19152

Comments

dmarro89 commented Nov 7, 2024 • edited Loading

janpfeifer commented Nov 17, 2024

janpfeifer commented Nov 17, 2024 • edited Loading

janpfeifer commented Nov 17, 2024

janpfeifer commented Nov 17, 2024

dmarro89 commented Nov 7, 2024 •

edited

Loading

janpfeifer commented Nov 17, 2024 •

edited

Loading