Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Detect corrupted files, exit on exception (DEV-4457) #459

Closed
wants to merge 3 commits into from

Conversation

siers
Copy link
Contributor

@siers siers commented Dec 17, 2024

This corruption warning is non-fatal, so we have to check if the error code is set to a specific JWRN after jpeg_read_scanlines. Test added that an exception is created in img.read.


References:

https://github.com/mozilla/mozjpeg/blob/6c9f0897afa1c2738d7222a0a9ab49e8b536a267/jerror.h#L195

JMESSAGE(JWRN_HUFF_BAD_CODE, "Corrupt JPEG data: bad Huffman code")

https://github.com/mozilla/mozjpeg/blob/6c9f0897afa1c2738d7222a0a9ab49e8b536a267/jerror.h#L280-L283

/* Nonfatal errors (we can keep going, but the data is probably corrupt) */
#define WARNMS(cinfo,code)  \
  ((cinfo)->err->msg_code = (code), \
   (*(cinfo)->err->emit_message) ((j_common_ptr) (cinfo), -1))

https://github.com/mozilla/mozjpeg/blob/6c9f0897afa1c2738d7222a0a9ab49e8b536a267/jdphuff.c#L551-L552

        if (s) {
          if (s != 1)           /* size of new coef should always be 1 */
            WARNMS(cinfo, JWRN_HUFF_BAD_CODE);
          CHECK_BIT_BUFFER(br_state, 1, goto undoit);

https://github.com/mozilla/mozjpeg/blob/6c9f0897afa1c2738d7222a0a9ab49e8b536a267/jdhuff.c#L463-L466

  if (l > 16) {
    WARNMS(state->cinfo, JWRN_HUFF_BAD_CODE);
    return 0;                   /* fake a zero as the safest result */
  }

Copy link

linear bot commented Dec 17, 2024

@siers siers force-pushed the feat/DEV-4457-detect-corrupt branch 2 times, most recently from 86469df to e02685a Compare December 18, 2024 14:11
@siers siers force-pushed the feat/DEV-4457-detect-corrupt branch from e02685a to 6ad6dc5 Compare December 18, 2024 14:27
src/sipi.cpp Outdated
@@ -862,6 +862,7 @@ int main(int argc, char *argv[])
}
} catch (Sipi::SipiImageError &err) {
std::cerr << err << std::endl;
exit(1);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, does that mean that in the past, all SipiImageError resulted in return code 0, but now, they will all result in return code 1?

That would not be the expected behaviour. Because often, an image is only slightly corrupted, but perfectly readable by most viewers. For example, the ca. 66'000 images of LHTT produced ca. 14'000 error logs. But most of them are negligible, because there is no real defect in the image. We'd like that SIPI only aborts with non-0 in these very specific cases:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very odd, but it was also my intention to check whether this new behavior breaks the build. Unfortunately the main build is broken. 🤦

Anyhow, I'll restrict it to the behavior you've described.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the last commit, after I resolved the build issues in another PR.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siers In this notion page, we discussed that we should probably not modify the SIPI behaviour at all -> https://www.notion.so/SIPI-error-logs-1748946b7d4080618e91cf0527b9d7a7?d=1748946b7d408031a344001cf88f05f8&pvs=4#1748946b7d408058825ac23f05bcfc38

Have you seen the discussion there?

Sorry that this comes only now, after you have already invested time. We should have spent more time on thinking about the issue, before creating a Linear ticket.

I suggest that you don't spend any more time on this PR, before we have decided what to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed that. Don't worry, the hardest part was fixing the build and we needed to do that anyway.

@siers siers force-pushed the feat/DEV-4457-detect-corrupt branch from 6ad6dc5 to d1693c6 Compare January 9, 2025 12:21
Copy link

codecov bot commented Jan 9, 2025

Codecov Report

Attention: Patch coverage is 54.54545% with 5 lines in your changes missing coverage. Please review.

Project coverage is 23.2%. Comparing base (de9e393) to head (b8c39b8).

Files with missing lines Patch % Lines
src/formats/SipiIOJpeg.cpp 42.8% 0 Missing and 4 partials ⚠️
src/sipi.cpp 0.0% 1 Missing ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
src/SipiImageError.hpp 29.4% <100.0%> (+16.5%) ⬆️
src/sipi.cpp 13.2% <0.0%> (-0.1%) ⬇️
src/formats/SipiIOJpeg.cpp 47.1% <42.8%> (-0.1%) ⬇️

... and 1 file with indirect coverage changes

@siers siers closed this Jan 14, 2025
@siers siers deleted the feat/DEV-4457-detect-corrupt branch January 14, 2025 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants