Skip to content

Commit

Permalink
Ignore non-breaking spaces and word joiners in s-041
Browse files Browse the repository at this point in the history
When the text being compared is from the image's alt attribute, since
they are not used in that context.
  • Loading branch information
apasel422 committed Jul 9, 2024
1 parent 1207259 commit a303d06
Show file tree
Hide file tree
Showing 4 changed files with 16 additions and 1 deletion.
5 changes: 4 additions & 1 deletion se/se_epub_lint.py
Original file line number Diff line number Diff line change
Expand Up @@ -1517,13 +1517,16 @@ def _lint_special_file_checks(self, filename: Path, dom: se.easy_xml.EasyXmlTree
loi_text_matches_figure = False
for child in figure.xpath("./img|./figcaption"):
figure_text = ""
loi_text_to_compare = loi_text
if child.tag == "img":
figure_text = child.get_attr("alt")
# Replace/remove characters that don't appear in alt attributes.
loi_text_to_compare = loi_text_to_compare.replace(se.NO_BREAK_SPACE, ' ').replace(se.WORD_JOINER, '')
elif child.tag == "figcaption":
# Replace tabs and newlines with a single space to better match figcaptions that contain <br/>
figure_text = regex.sub(r"[ \n\t]+", " ", child.inner_text())

if loi_text == figure_text:
if loi_text_to_compare == figure_text:
loi_text_matches_figure = True
break

Expand Down
3 changes: 3 additions & 0 deletions tests/lint/semantic/s-041/golden/s-041-out.txt
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
s-004 [Error] chapter-1.xhtml `img` element missing `alt` attribute.
<img>
<img>
<img>
s-041 [Manual Review] chapter-1.xhtml The text in `#f-5`'s LoI entry does not
match either its `<figcaption>` element or its `<img>` `alt` attribute.
s-041 [Manual Review] chapter-1.xhtml The text in `#f-6`'s LoI entry does not
match either its `<figcaption>` element or its `<img>` `alt` attribute.
s-041 [Manual Review] chapter-1.xhtml The text in `#f-7`'s LoI entry does not
match either its `<figcaption>` element or its `<img>` `alt` attribute.
s-041 [Manual Review] chapter-1.xhtml The text in `#f-9`'s LoI entry does not
match either its `<figcaption>` element or its `<img>` `alt` attribute.
3 changes: 3 additions & 0 deletions tests/lint/semantic/s-041/in/src/epub/text/chapter-1.xhtml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@
<!-- will cause s-004 to be emitted, but we are deliberately testing behavior with missing alt -->
<figure id="f-6"><img/><figcaption>t-6-cap</figcaption></figure>
<figure id="f-7"><img alt="t-7-alt."/><figcaption>t-7-cap</figcaption></figure>

<figure id="f-8"><img alt="Mr. Smith 2–3 years ago."/></figure>
<figure id="f-9"><img/><figcaption>Mr. Smith 2–3 years ago.</figcaption></figure>
</section>
</body>
</html>
6 changes: 6 additions & 0 deletions tests/lint/semantic/s-041/in/src/epub/text/loi.xhtml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,12 @@
<li><p><a href="chapter-1.xhtml#f-5">x</a></p></li>
<li><p><a href="chapter-1.xhtml#f-6">x</a></p></li>
<li><p><a href="chapter-1.xhtml#f-7">x</a></p></li>

<!-- text matches alt once tags and word joiners and non-breaking spaces are removed -->
<li><p><a href="chapter-1.xhtml#f-8"><abbr epub:type="z3998:name-title">Mr.</abbr> Smith 2⁠–⁠3 years ago.</a></p></li>

<!-- text does not match caption due to differing word joiners and non-breaking spaces -->
<li><p><a href="chapter-1.xhtml#f-9"><abbr epub:type="z3998:name-title">Mr.</abbr> Smith 2⁠–⁠3 years ago.</a></p></li>
</ol>
</nav>
</body>
Expand Down

0 comments on commit a303d06

Please sign in to comment.