Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove incorrect block index sections from asdf test data #296

Merged
merged 2 commits into from
Oct 13, 2023

Conversation

braingram
Copy link
Contributor

Asdf is making some major updates to internal block reading and writing code (see this PR for the extensive changes).

One of the changes includes issuing a new warning when a file is opened which contains an incorrect block index. The block index is a small, optional YAML document at the end of an ASDF file that contains the offsets of each ASDF binary block within the ASDF file (see the asdf-standard docs for more details). Hand editing ASDF files can often lead to incorrect offsets in the block index. When asdf detects a block index that is incorrect (or if no block index is found) it ignores the block index and falls back to reading blocks sequentially starting with the first.

With dkist main tested against the above asdf PR several tests are failing (see here for the full log):

=========================== short test summary info ============================
FAILED dkist/dkist/io/asdf/tests/test_dataset.py::test_read_all_schema_versions[eit_dataset_asdf_path0]
FAILED dkist/dkist/io/asdf/tests/test_dataset.py::test_read_all_schema_versions[eit_dataset_asdf_path1]
FAILED dkist/dkist/io/asdf/tests/test_dataset.py::test_read_all_schema_versions[eit_dataset_asdf_path2]
FAILED dkist/dkist/io/asdf/tests/test_dataset.py::test_read_all_schema_versions[eit_dataset_asdf_path3]
FAILED dkist/dkist/io/asdf/tests/test_dataset.py::test_loader_getitem_with_chunksize[eit_dataset_asdf_path0]
FAILED dkist/dkist/io/asdf/tests/test_dataset.py::test_loader_getitem_with_chunksize[eit_dataset_asdf_path1]
FAILED dkist/dkist/io/asdf/tests/test_dataset.py::test_loader_getitem_with_chunksize[eit_dataset_asdf_path2]
FAILED dkist/dkist/io/asdf/tests/test_dataset.py::test_loader_getitem_with_chunksize[eit_dataset_asdf_path3]
ERROR dkist/dkist/io/tests/test_file_manager.py::test_length_one_first_array_axis
ERROR dkist/dkist/io/tests/test_file_manager.py::test_download_quality[kwargs0]
ERROR dkist/dkist/io/tests/test_file_manager.py::test_download_quality[kwargs1]
ERROR dkist/dkist/io/tests/test_file_manager.py::test_download_quality_movie[kwargs0]
ERROR dkist/dkist/io/tests/test_file_manager.py::test_download_quality_movie[kwargs1]
======== 8 failed, 286 passed, 2 skipped, 5 errors in 138.79s (0:02:18) ========

This is due to several test files in the dkist repository that contain block index offsets that are incorrect (see the changed files in this PR).

This PR deletes the invalid block index from each of these files.

@codecov
Copy link

codecov bot commented Aug 14, 2023

Codecov Report

All modified lines are covered by tests ✅

Comparison is base (b6cb4c1) 98.28% compared to head (d7f149c) 98.28%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #296   +/-   ##
=======================================
  Coverage   98.28%   98.28%           
=======================================
  Files          40       40           
  Lines        2452     2452           
=======================================
  Hits         2410     2410           
  Misses         42       42           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@braingram
Copy link
Contributor Author

I wasn't sure if these changes warranted a changelog. I'm unable to change labels (if there is one that allows no-changelog-entry or similar).

@braingram braingram mentioned this pull request Oct 12, 2023
@Cadair Cadair added the No Changelog Entry Needed This pull request does not need a changelog entry label Oct 13, 2023
@Cadair Cadair merged commit 1529024 into DKISTDC:main Oct 13, 2023
11 of 12 checks passed
@braingram braingram deleted the asdf_bad_indices branch October 13, 2023 18:17
Cadair added a commit to Cadair/dkist that referenced this pull request Oct 16, 2023
@Cadair
Copy link
Member

Cadair commented Oct 19, 2023

I don't know if it's related to this PR yet or not, but I just tried to write a test which makes use of the WCSes in these files and it's now causing segfaults, which is fun. @braingram got any ideas? 😆

@Cadair
Copy link
Member

Cadair commented Oct 19, 2023

Without this PR the 1.0.0 eit_dataset file does not segfault the others do. with this PR 1.0.0 does as well but not 1.1.0 (which wasn't modified in this PR).

@braingram
Copy link
Contributor Author

Would you point me to the test?

Using asdf main and dkist main and dkist/data/test/eit_dataset-1.0.0.asdf, if I open the asdf file, take a reference to the wcs, then close the asdf file and access the wcs I do get a segfault.

af = asdf.open('dkist/data/test/eit_dataset-1.0.0.asdf')
w = af['dataset'].wcs
af.close()
print(w)  # segfault

Keeping the file open (which by default is being memory mapped) prevents the segfault. Also, passing copy_arrays=True to asdf.open also prevents the segfault (as the arrays contained in the wcs are no longer memory mapped). I see the same behavior with asdf 2.15.2. However I am also seeing the same behavior after reverting 1529024 so I don't think I'm testing the same thing you are based on your description of the issue.

braingram added a commit to braingram/dkist that referenced this pull request Oct 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
No Changelog Entry Needed This pull request does not need a changelog entry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants