Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError in test_load_image #8274

Closed
KumoLiu opened this issue Dec 30, 2024 · 4 comments · Fixed by #8275 or #8297
Closed

AssertionError in test_load_image #8274

KumoLiu opened this issue Dec 30, 2024 · 4 comments · Fixed by #8275 or #8297
Labels
bug Something isn't working

Comments

@KumoLiu
Copy link
Contributor

KumoLiu commented Dec 30, 2024

[2024-12-28T19:19:03.755Z] ======================================================================

[2024-12-28T19:19:03.755Z] FAIL: test_nibabel_reader_gpu_3 (tests.test_load_image.TestLoadImage)

[2024-12-28T19:19:03.755Z] ----------------------------------------------------------------------

[2024-12-28T19:19:03.755Z] Traceback (most recent call last):

[2024-12-28T19:19:03.755Z]   File "/usr/local/lib/python3.10/dist-packages/parameterized/parameterized.py", line 620, in standalone_func

[2024-12-28T19:19:03.755Z]     return func(*(a + p.args), **p.kwargs, **kw)

[2024-12-28T19:19:03.755Z]   File "/home/jenkins/agent/workspace/Monai-latest-image/tests/test_load_image.py", line 236, in test_nibabel_reader_gpu

[2024-12-28T19:19:03.755Z]     self.assertTrue(torch.equal(result_cpu, result.cpu()))

[2024-12-28T19:19:03.755Z] AssertionError: False is not true
@KumoLiu KumoLiu added the bug Something isn't working label Dec 30, 2024
@KumoLiu
Copy link
Contributor Author

KumoLiu commented Dec 30, 2024

Hi @yiheng-wang-nv, could you please take a look at this issue? The test case encountered a random failure in the 24.08 on the A100. Thanks.

@KumoLiu
Copy link
Contributor Author

KumoLiu commented Jan 6, 2025

Still randomly failed. cc @yiheng-wang-nv

[2025-01-03T19:18:59.181Z] ======================================================================

[2025-01-03T19:18:59.181Z] FAIL: test_nibabel_reader_gpu_3 (tests.test_load_image.TestLoadImage)

[2025-01-03T19:18:59.181Z] ----------------------------------------------------------------------

[2025-01-03T19:18:59.181Z] Traceback (most recent call last):

[2025-01-03T19:18:59.181Z]   File "/usr/local/lib/python3.10/dist-packages/parameterized/parameterized.py", line 620, in standalone_func

[2025-01-03T19:18:59.181Z]     return func(*(a + p.args), **p.kwargs, **kw)

[2025-01-03T19:18:59.181Z]   File "/home/jenkins/agent/workspace/Monai-latest-image/tests/test_load_image.py", line 236, in test_nibabel_reader_gpu

[2025-01-03T19:18:59.181Z]     self.assertTrue(torch.allclose(result_cpu, result.cpu(), atol=1e-6))

[2025-01-03T19:18:59.181Z] AssertionError: False is not true

[2025-01-03T19:18:59.181Z] 

[2025-01-03T19:18:59.181Z] ----------------------------------------------------------------------

KumoLiu pushed a commit that referenced this issue Jan 6, 2025
Related to #8274 , this PR is used to check potential issues. When I
used the same environment as the nightly test, the error was not
reproduced. Therefore, I hope the new change can show more information
about the error.

### Description

A few sentences describing the changes proposed in this pull request.

### Types of changes
<!--- Put an `x` in all the boxes that apply, and remove the not
applicable items -->
- [x] Non-breaking change (fix or new feature that would not break
existing functionality).
- [ ] Breaking change (fix or new feature that would cause existing
functionality to change).
- [ ] New tests added to cover the changes.
- [ ] Integration tests passed locally by running `./runtests.sh -f -u
--net --coverage`.
- [ ] Quick tests passed locally by running `./runtests.sh --quick
--unittests --disttests`.
- [ ] In-line docstrings updated.
- [ ] Documentation updated, tested `make html` command in the `docs/`
folder.

---------

Signed-off-by: Yiheng Wang <[email protected]>
@KumoLiu
Copy link
Contributor Author

KumoLiu commented Jan 8, 2025

More detailed error. cc @yiheng-wang-nv


[2025-01-07T19:18:57.654Z] ======================================================================

[2025-01-07T19:18:57.654Z] FAIL: test_nibabel_reader_gpu_3 (tests.test_load_image.TestLoadImage)

[2025-01-07T19:18:57.654Z] ----------------------------------------------------------------------

[2025-01-07T19:18:57.654Z] Traceback (most recent call last):

[2025-01-07T19:18:57.654Z]   File "/usr/local/lib/python3.10/dist-packages/parameterized/parameterized.py", line 620, in standalone_func

[2025-01-07T19:18:57.654Z]     return func(*(a + p.args), **p.kwargs, **kw)

[2025-01-07T19:18:57.654Z]   File "/home/jenkins/agent/workspace/Monai-latest-image/tests/test_load_image.py", line 236, in test_nibabel_reader_gpu

[2025-01-07T19:18:57.654Z]     assert_allclose(result_cpu, result.cpu(), atol=1e-6)

[2025-01-07T19:18:57.654Z]   File "/home/jenkins/agent/workspace/Monai-latest-image/tests/utils.py", line 135, in assert_allclose

[2025-01-07T19:18:57.654Z]     np.testing.assert_allclose(actual, desired, *args, **kwargs)

[2025-01-07T19:18:57.654Z]   File "/usr/local/lib/python3.10/dist-packages/numpy/testing/_private/utils.py", line 1592, in assert_allclose

[2025-01-07T19:18:57.654Z]     assert_array_compare(compare, actual, desired, err_msg=str(err_msg),

[2025-01-07T19:18:57.654Z]   File "/usr/lib/python3.10/contextlib.py", line 79, in inner

[2025-01-07T19:18:57.654Z]     return func(*args, **kwds)

[2025-01-07T19:18:57.654Z]   File "/usr/local/lib/python3.10/dist-packages/numpy/testing/_private/utils.py", line 783, in assert_array_compare

[2025-01-07T19:18:57.654Z]     flagged = func_assert_same_pos(x, y, func=isnan, hasval='nan')

[2025-01-07T19:18:57.654Z]   File "/usr/local/lib/python3.10/dist-packages/numpy/testing/_private/utils.py", line 753, in func_assert_same_pos

[2025-01-07T19:18:57.654Z]     raise AssertionError(msg)

[2025-01-07T19:18:57.654Z] AssertionError: 

[2025-01-07T19:18:57.654Z] Not equal to tolerance rtol=1e-07, atol=1e-06

[2025-01-07T19:18:57.654Z] 

[2025-01-07T19:18:57.654Z] x and y nan location mismatch:

[2025-01-07T19:18:57.654Z]  x: array([[[[0.896333, 0.001718, 0.608176, ..., 0.833388, 0.786462,

[2025-01-07T19:18:57.654Z]           0.769311],

[2025-01-07T19:18:57.654Z]          [0.26893 , 0.231931, 0.654519, ..., 0.170201, 0.532211,...

[2025-01-07T19:18:57.654Z]  y: array([[[[ 4.876519e-43,  0.000000e+00,  0.000000e+00, ...,

[2025-01-07T19:18:57.654Z]            1.786100e+00, -2.456252e+16,  1.419553e+00],

[2025-01-07T19:18:57.654Z]          [-2.201973e-17,  1.678210e+00,  8.583682e-25, ...,...

[2025-01-07T19:18:57.654Z] 

[2025-01-07T19:18:57.654Z] ----------------------------------------------------------------------

[2025-01-07T19:18:57.654Z] Ran 16089 tests in 2528.825s

@yiheng-wang-nv
Copy link
Contributor

some values loaded by GPU are significantly small, I checked recently nightly tests, the error always happen in 24.08 base container with A100. Hi @KumoLiu , I think the issue may come from other softwares instead of code here.

To solve the issue, I hope to do more tests, do you know if we can manually trigger the test in the same machine on a different branch?

KumoLiu pushed a commit that referenced this issue Jan 14, 2025
Fixes #8274 .

### Description

The new test has already tested with the same 24.08 + A100 env.

I did some tests but cannot reproduce the original test case error
(there are NaN values or significant small/large data). Since only 24.08
base image has the issue (24.10 does not have), I decided to use a
different test case for 24.08 and prepared this PR

### Types of changes
<!--- Put an `x` in all the boxes that apply, and remove the not
applicable items -->
- [x] Non-breaking change (fix or new feature that would not break
existing functionality).
- [ ] Breaking change (fix or new feature that would cause existing
functionality to change).
- [ ] New tests added to cover the changes.
- [ ] Integration tests passed locally by running `./runtests.sh -f -u
--net --coverage`.
- [ ] Quick tests passed locally by running `./runtests.sh --quick
--unittests --disttests`.
- [ ] In-line docstrings updated.
- [ ] Documentation updated, tested `make html` command in the `docs/`
folder.

---------

Signed-off-by: Yiheng Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants