fix: intermittent segfault in PyPy #3089

ianna · 2024-04-22T12:30:28Z

awkward-cpp/tests-cpu-kernels-explicit/test_unit_cpuawkward_UnionArray32_flatten_length_64.py::test_unit_cpuawkward_UnionArray32_flatten_length_64_3 FAILED [ 56%]
Error: test_unit_cpuawkward_UnionArray32_flatten_length_64_3

assert [0] == approx([7 ± 7.0e-06])
  
  comparison failed. Mismatched elements: 1 / 1:
  Max absolute difference: 7
  Max relative difference: inf
  Index | Obtained | Expected   
  0     | 0        | 7 ± 7.0e-06
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> traceback >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    def test_unit_cpuawkward_UnionArray32_flatten_length_64_3():
        total_length = [123]
        total_length = (ctypes.c_int64*len(total_length))(*total_length)
        fromtags = [0, 0, 0, 0]
        fromtags = (ctypes.c_int8*len(fromtags))(*fromtags)
        fromindex = [0, 1, 2, 3]
        fromindex = (ctypes.c_int32*len(fromindex))(*fromindex)
        length = 4
        offsetsraws = [[0, 1, 3, 5, 7], [1, 3, 5, 7, 9]]
        offsetsraws_ptr = (ctypes.POINTER(ctypes.c_int64) * len(offsetsraws))()
        for i in range(len(offsetsraws)):
            offsetsraws_ptr[i] = (ctypes.c_int64 * len(offsetsraws[i]))(*offsetsraws[i])
        offsetsraws = offsetsraws_ptr
        funcC = getattr(lib, 'awkward_UnionArray32_flatten_length_64')
        ret_pass = funcC(total_length, fromtags, fromindex, length, offsetsraws)
        pytest_total_length = [7]
>       assert total_length[:len(pytest_total_length)] == pytest.approx(pytest_total_length)
E       assert [0] == approx([7 ± 7.0e-06])
E         
E         comparison failed. Mismatched elements: 1 / 1:
E         Max absolute difference: 7
E         Max relative difference: inf
E         Index | Obtained | Expected   
E         0     | 0        | 7 ± 7.0e-06
fromindex  = <_ctypes.array.c_int_Array_4 object at 0x000000000428f448>
fromtags   = <_ctypes.array.c_byte_Array_4 object at 0x000000000428f430>
funcC      = <ctypes.CDLL.__init__.<locals>._FuncPtr object at 0x00000000018b5388>
i          = 1
length     = 4
offsetsraws = <_ctypes.array.LP_c_long_Array_2 object at 0x000000000428f478>
offsetsraws_ptr = <_ctypes.array.LP_c_long_Array_2 object at 0x000000000428f478>
pytest_total_length = [7]
ret_pass   = <awkward_cpp._kernel_signatures.ERROR object at 0x0000000004dd0950>
total_length = <_ctypes.array.c_long_Array_1 object at 0x000000000428f418>
awkward-cpp/tests-cpu-kernels-explicit/test_unit_cpuawkward_UnionArray32_flatten_length_64.py:70: AssertionError
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> PDB post_mortem (IO-capturing turned off) >>>>>>>>>>>>>>>>>>>
> /home/runner/work/awkward/awkward/awkward-cpp/tests-cpu-kernels-explicit/test_unit_cpuawkward_UnionArray32_flatten_length_64.py(70)test_unit_cpuawkward_UnionArray32_flatten_length_64_3()
-> assert total_length[:len(pytest_total_length)] == pytest.approx(pytest_total_length)
(Pdb) 
!!!!!!!!!!!!!!!!!!! _pytest.outcomes.Exit: Quitting debugger !!!!!!!!!!!!!!!!!!!
======================= 1 failed, 3261 passed in 18.02s ========================
Error: Process completed with exit code 2.

ianna

nope.

…nels

ianna

@jpivarski - removing coverage from the PyPy test job cures the segfault! I think, you can merge this PR to fix the failing jobs.

jpivarski

Most of the diff in this PR removes the unnecessary length argument from awkward_unique_ranges, which can't be responsible for the PyPy segfault unless the length argument had been passed inconsistently, and now it's consistent. (It doesn't look that way—it looks to me like a neutral-refactoring.)

Another part of this diff is explicit integer casts in the C code. That also should be unrelated to PyPy segfaults, since it's not in the part of the code that's different between PyPy and CPython. Perhaps PyPy's ctypes is different, passing data as a different integer type than intended, but these are conversion-casts in the C code, not reinterpret-casts.

I don't see anything in this PR that explains why PyPy was segfaulting, or would cause PyPy to stop segfaulting. But if it's not segfaulting now, I'm definitely willing to merge this update and see what happens to the PyPy test for the next few weeks. (Everything in this PR is a net-positive, regardless of whether it fixes the segfault or not.) If the segfault is fixed, then it's likely something that I didn't catch by eye, but is a correction nonetheless.

@ianna, do you have a theory about what was causing the segfault before and not now? Some line of code in the diff that you can point to? Either way, go ahead and merge this PR and we'll see what happens. I'll even propagate it out to the Ragged library, where I initially encountered the segfault, to see if it's fixed there, too.

jpivarski · 2024-05-02T14:34:57Z

Actually, don't merge this until #3086 is merged. There's going to be a conflict, and it will be easier to resolve the conflict in this PR (the diff is smaller). So let's get #3086 into main and then update this PR to main to deal with that.

The conflict will be in how removing the unnecessary length argument is done.

jpivarski · 2024-05-02T16:46:06Z

#3086 has been merged, and so I updated this branch and will run through the tests again. If all the tests pass, this PR can be merged, so I've enabled auto-merge.

jpivarski · 2024-05-02T16:54:39Z

To record what I learned from @ianna during our meeting: the relevant change is that this PR runs the code-coverage test for CPython and not for PyPy. The hypothesis is that the segfault was never in Awkward, but in pytest-cov. (You have a link to an issue about that in the pytest-cov project, right, @ianna?) That hypothesis would make sense of a few things:

The segfault was only seen in CI; @ianna wasn't running pytest-cov locally and didn't see the segfault (unless you eventually added that to your local test?).
The segfault was first discovered in ragged, which has a very different test suite than Awkward, but it also runs pytest-cov.

Although it's still possible that an error in Awkward is only revealed when pytest-cov runs in PyPy (because it somehow stomps on memory that doesn't get stomped on without pytest-cov), if I remove pytest-cov from ragged's tests and get no segfault, that will be a strong indication that it's caused by pytest-cov and not Awkward, since ragged's test suite is so different from Awkward's. It wouldn't be 100% conclusive, but it would be conclusive enough for me.

jpivarski · 2024-05-02T17:00:35Z

There's a test of our minimally supported pyarrow version that intermittently, but rarely, fails because of

AssertionError: assert [[['', 'αβγ', '', ''], ['', '', '']], [], [['', '→δε←', '', ''], ['', 'ζz', 'zζ', '', ''], ['', 'abc', ' ']]] == [[['', 'αβγ', '', ''], ['', '', '']], [], [['', '→δε←', '', ''], ['', 'ζz', 'zζ', '', ''], ['', 'abc', '', '']]]

I am very sure that this one is due to something in pyarrow. We could increase our minimal pyarrow, but that wouldn't be for compatibility reasons, it would be because pyarrow 7 has issues. Instead, I'll just re-run this test by hand.

jpivarski · 2024-05-02T17:56:45Z

I'm testing it in ragged here: scikit-hep/ragged#47

Although I remember segfaults in PyPy, I'm not seeing them now—with or without codecov.

jpivarski · 2024-05-03T19:45:50Z

FYI: #3097 was updated to main, with this fix in, and unfortunately, there's still a PyPy segfault: https://github.com/scikit-hep/awkward/actions/runs/8943876656/job/24569544958?pr=3097

I was going to make the PyPy test required, but it looks like it's not ready for that, yet.

ianna · 2024-05-03T19:49:21Z

FYI: #3097 was updated to main, with this fix in, and unfortunately, there's still a PyPy segfault: https://github.com/scikit-hep/awkward/actions/runs/8943876656/job/24569544958?pr=3097

I was going to make the PyPy test required, but it looks like it's not ready for that, yet.

yes, I was just looking at it :-(

debug: add debugging flag

0c10f5e

ianna linked an issue Apr 22, 2024 that may be closed by this pull request

Intermittent segfault in pypy #2971

Closed

pre-commit-ci bot temporarily deployed to docs April 22, 2024 12:43 Inactive

test: debug

4c40472

ianna force-pushed the ianna/2971-intermittent-segfault-in-pypy branch from b682f40 to 4c40472 Compare April 22, 2024 13:09

ianna temporarily deployed to docs April 22, 2024 13:19 — with GitHub Actions Inactive

fix: allocate memory for 2d arrays in tests

e82ee31

ianna changed the title ~~test: add debugging~~ fix: intermittent segfault in PyPy Apr 30, 2024

Merge branch 'main' into ianna/2971-intermittent-segfault-in-pypy

532ad1a

ianna temporarily deployed to docs April 30, 2024 14:49 — with GitHub Actions Inactive

ianna commented May 1, 2024

View reviewed changes

fix: remove legacy parameter from api

381d34d

ianna temporarily deployed to docs May 1, 2024 22:34 — with GitHub Actions Inactive

ianna added 2 commits May 2, 2024 01:05

debug: exclude some tests that rely on 2d array parameters in cpu ker…

273f06b

…nels

debug: exclude cpu kernel unt tests with 2d arrays as parameters

e4baf54

ianna force-pushed the ianna/2971-intermittent-segfault-in-pypy branch from 63243ba to e4baf54 Compare May 1, 2024 23:40

ianna added 3 commits May 2, 2024 01:46

fix: typo

fb09163

test: remove coverage for pypy

ddd6b0d

fix: restore tests

8ce374c

ianna temporarily deployed to docs May 2, 2024 01:26 — with GitHub Actions Inactive

ianna added 3 commits May 2, 2024 03:37

fix: restore cpu kernel unit tests

410c40d

fix: remove debugger

338e977

fix: restore tests

23b890f

ianna had a problem deploying to docs May 2, 2024 01:48 — with GitHub Actions Error

fix: add more tests

1418372

ianna requested a review from jpivarski May 2, 2024 01:56

ianna temporarily deployed to docs May 2, 2024 02:02 — with GitHub Actions Inactive

ianna commented May 2, 2024

View reviewed changes

jpivarski approved these changes May 2, 2024

View reviewed changes

Merge branch 'main' into ianna/2971-intermittent-segfault-in-pypy

90808bd

jpivarski enabled auto-merge (squash) May 2, 2024 16:46

jpivarski disabled auto-merge May 2, 2024 16:57

jpivarski deployed to docs May 2, 2024 16:58 — with GitHub Actions View deployment

jpivarski enabled auto-merge (squash) May 2, 2024 17:05

jpivarski merged commit 7f1d261 into main May 2, 2024
41 checks passed

jpivarski deleted the ianna/2971-intermittent-segfault-in-pypy branch May 2, 2024 17:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: intermittent segfault in PyPy #3089

fix: intermittent segfault in PyPy #3089

ianna commented Apr 22, 2024 •

edited

Loading

ianna left a comment •

edited

Loading

ianna left a comment

jpivarski left a comment

jpivarski commented May 2, 2024

jpivarski commented May 2, 2024

jpivarski commented May 2, 2024

jpivarski commented May 2, 2024

jpivarski commented May 2, 2024

jpivarski commented May 3, 2024

ianna commented May 3, 2024

fix: intermittent segfault in PyPy #3089

fix: intermittent segfault in PyPy #3089

Conversation

ianna commented Apr 22, 2024 • edited Loading

ianna left a comment • edited Loading

Choose a reason for hiding this comment

ianna left a comment

Choose a reason for hiding this comment

jpivarski left a comment

Choose a reason for hiding this comment

jpivarski commented May 2, 2024

jpivarski commented May 2, 2024

jpivarski commented May 2, 2024

jpivarski commented May 2, 2024

jpivarski commented May 2, 2024

jpivarski commented May 3, 2024

ianna commented May 3, 2024

ianna commented Apr 22, 2024 •

edited

Loading

ianna left a comment •

edited

Loading