Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TIFF: Add ASCII validation for ASCII fields #487

Open
wants to merge 4 commits into
base: integration
Choose a base branch
from

Conversation

david-russo
Copy link
Member

  • Added ASCII validation for ASCII fields
  • Fixed problem reporting InkNames field values

Fixes a bug in `readASCIIArray` which prevented the array from being
populated. This went unnoticed because it was only used for one
uncommon field (InkNames), but now the same logic is shared for all
ASCII fields.
@codecov
Copy link

codecov bot commented Oct 11, 2019

Codecov Report

Merging #487 into integration will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@              Coverage Diff               @@
##             integration     #487   +/-   ##
==============================================
  Coverage          49.69%   49.69%           
  Complexity           987      987           
==============================================
  Files                 55       55           
  Lines               7750     7750           
  Branches            1406     1406           
==============================================
  Hits                3851     3851           
  Misses              3435     3435           
  Partials             464      464

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9e59ca7...0a3982a. Read the comment docs.

@carlwilson carlwilson added feature New functionality to be developed P2 Medium priority issues to be scheduled in a future release labels Oct 18, 2019
@carlwilson carlwilson added this to the v1.24-m4 Release milestone Oct 18, 2019
if (invalidAscii) {
_info.setMessage(new ErrorMessage(
MessageConstants.TIFF_HUL_72, value));
_info.setValid(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi I have mixed feelings with declaring invalid TIFF files just because of non ASCII (but ISO-8859-1) characters.
Indeed, I know, because this control doesn't exist, that we have thousand of images that have "Bibliothèque nationale de France" in comment but are otherwise perfectly fine.

It would be nice if we could make this configurable like the byteoffset=true parameter to allow it to simply be a warning.
A parameter like invalidNonAsciiCharacters with the default value true will be great.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tledoux I've been working on a more generic way of ignoring / downgrading / upgrading errors by id using a configuration file passed at run-time. It might prove a useful addition as it makes reporting more flexible. Will do my best to get the PR out for the first part of next week for review.

Copy link
Member Author

@david-russo david-russo Oct 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, yeah, we ourselves have quite a few (somewhat questionable) "©" symbols peppered about our TIFF collections. That was actually the motivator for the addition of this check. If JHOVE had had it earlier we would have picked up on this issue sooner and been able to remedy it with a better alternative.

The TIFF 5 and 6 specs are very clear that these fields should only contain "7-bit ASCII". Rendering software has no way to know what character encoding is being used in these fields if not ASCII, and so no characters outside of the ASCII range can be relied upon to render predictably outside of the software which wrote them. Given that, I think marking such TIFFs as invalid is actually fairly desirable, from a preservation perspective, though I certainly sympathize about now needing to deal with the problems our past mistakes have made for us.

The optional parameter isn't a bad idea, if you'd like to implement it, but it does make me think how much better a more general solution would be, allowing us to selectively suppress any messages using our cool newfangled message IDs! Alas... we currently set our validity statuses separately from our messages, so until message objects carry their own status modifiers, parameters may be the way to go.

Then again, a part of me also can't help but feel that by providing an explicit option for this check we may be implying that it's safe or reasonable to ignore this incompatibility under certain circumstances, which I don't believe is necessarily true. So I'm torn... but feel free to add what options you like :)

Copy link
Member Author

@david-russo david-russo Oct 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlwilson Just now saw your reply. Of course you're already working on it :)

@david-russo david-russo added the bug A product defect that needs fixing label Oct 19, 2019
@carlwilson carlwilson removed this from the v1.24-m4 Release milestone Apr 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A product defect that needs fixing feature New functionality to be developed P2 Medium priority issues to be scheduled in a future release
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

3 participants