Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Eigen integration silently fails with recent versions of Eigen #746

Open
dfm opened this issue Oct 3, 2024 · 6 comments
Open

[BUG]: Eigen integration silently fails with recent versions of Eigen #746

dfm opened this issue Oct 3, 2024 · 6 comments

Comments

@dfm
Copy link

dfm commented Oct 3, 2024

Problem description

When I build the following simple nanobind module with the current master branch of Eigen:

#include <nanobind/nanobind.h>
#include <nanobind/eigen/dense.h>

namespace nb = nanobind;

NB_MODULE(nanobind_example_ext, m) {
    using Matrix = Eigen::Matrix<int, 1, 1>;
    m.def("add", [](Matrix a, Matrix b) { return a + b; });
}

The output of calling:

import numpy as np
nanobind_example.add(np.int32([0]), np.int32([0]))

is array([17726950], dtype=int32) instead of the expected array([0], dtype=int32).

This can be fixed by adding a return type:

-     m.def("add", [](Matrix a, Matrix b) { return a + b; });
+     m.def("add", [](Matrix a, Matrix b) -> Matrix { return a + b; });

and the example above works properly using the last stable release of Eigen, so I expect it has something to do with a change in the type produced by the addition. Unfortunately I don't know enough about the nanobind (or Eigen!) internals to know where to start debugging this, but I wanted to check here if this was an expected sharp edge, or if this is something that should work.

I note that I originally came across this issue when I discovered a test failure here:

def test10_eigen_scalar_default():
x = t.default_arg()
assert x==0

which has this same syntax.

Reproducible example code

#include <nanobind/nanobind.h>
#include <nanobind/eigen/dense.h>

namespace nb = nanobind;

NB_MODULE(nanobind_example_ext, m) {
    using Matrix = Eigen::Matrix<int, 1, 1>;
    m.def("add", [](Matrix a, Matrix b) { return a + b; });
}
import numpy as np
assert nanobind_example.add(np.int32([0]), np.int32([0])) == 0
@wjakob
Copy link
Owner

wjakob commented Oct 4, 2024

I can't reproduce this issue on my machine. What version of Eigen are you using? What OS is this on?

@wjakob
Copy link
Owner

wjakob commented Oct 4, 2024

I did sme more checking to see if there could be an issue involving ownership management: the expression caster turns the addition expression into a Eigen::Array<int, 1, 1, 0, 1, 1> and then uses the rv_policy::move RVP to convert it into a Python object. Since the array only encapsulates a small amount of memory, nanobind downgrades this to rv_policy::copy and creates a new NumPy array containing a copy of the result. It all looks good to me, so I don't understand the issue.

The CI also hasn't had shown any failures here in the last months. Could it be that there is a problem on your end? If this is truly a nanobind bug, could I ask you to make a PR that causes a failure to help reproducibility on my end?

@dfm
Copy link
Author

dfm commented Oct 4, 2024

Thanks for looking into this! I'm seeing this on my macbook with commit 44b16f48cb5c5e7b9f5ed2b4d6a14ed14ab302e1 of Eigen (the current tip of the master from https://gitlab.com/libeigen/eigen). The original error showed up in our internal CI at Google with an upgrade of Eigen to commit 2d4c9b400cca33d2f5cf316efc7151236244edb1 (the nanobind test10_eigen_scalar_default started failing then).

Let me try bisecting Eigen to see if I can narrow down when this occurred. I'll report back ASAP with more info. Thanks!

@dfm
Copy link
Author

dfm commented Oct 4, 2024

Bisecting was actually fast! It looks like the offending Eigen commit actually is 2d4c9b400cca33d2f5cf316efc7151236244edb1. I'm happy to open a PR using this Eigen commit to try to reproduce in CI if that would be helpful.

@hawkinsp
Copy link
Contributor

hawkinsp commented Oct 4, 2024

One possibly relevant thing is that both Google's CI and your macbook likely use clang as a compiler.

I tried to reproduce the same failure on my Linux machine with Eigen from master and was unable, but perhaps I didn't get the compiler stars to align right.

dfm added a commit to dfm/nanobind that referenced this issue Oct 4, 2024
This isn't really meant to be merged, but it's a follow up from
wjakob#746 demonstrating the test
failure with the current master branch of Eigen.

This workflow seems to fail when run on macOS, and (as @hawkinsp
suggested) I expect the issue is related to using clang instead of gcc.
@dfm
Copy link
Author

dfm commented Oct 4, 2024

I expect @hawkinsp is right here! I've opened #749 demonstrating the issue in CI, and everything works fine using GCC on Ubuntu, but fails on macOS with clang. Hopefully this can be illuminating as to what's going on here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants