Skip to content

Commit

Permalink
Merge pull request #685 from jcjgraf/core/jpgheader
Browse files Browse the repository at this point in the history
Improve JPEG Header Detection
  • Loading branch information
karlch authored Aug 16, 2023
2 parents 0b0b2c6 + 30bff5c commit 4330b0c
Show file tree
Hide file tree
Showing 2 changed files with 29 additions and 16 deletions.
5 changes: 5 additions & 0 deletions docs/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,11 @@ Added:
* ``PySide6``: Use PySide6 (Qt for Python). This is highly experimental and should be
used with care.

Changed:
^^^^^^^^
* The JPEG image header check was simplified to have a false negative rate of 0, while
maintaining a decently low false positive rate.

Fixed:
^^^^^^

Expand Down
40 changes: 24 additions & 16 deletions vimiv/utils/imageheader.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,28 +140,36 @@ def check_verified(header: bytes, file: BinaryIO) -> bool:


def _test_jpg(h: bytes, _f: BinaryIO) -> bool:
"""Joint Photographic Experts Group (JPEG) in different kinds of "subtypes"(?).
"""Joint Photographic Experts Group (JPEG).
Extension: .jpeg, .jpg
Extension: .jpeg, .jpg (and probably more)
Most JPEG images are of JPEG/JFIF or JPEG/Exif format, but every manufacturer can
create their own JPEG-based file formats. Therefore, there are many different JPEG-
based file formats.
JPEGs are a list of segments. Each segments starts with a 1 byte marker. Each marker
is preceded by byte 0xFF. Each valid JPEG starts with segment "Start of Image
(SOI)", which has marker 0xD8.
The SOI section is followed by the APP[0-14] section, which are used by different
file formats. Since there are more than 15 file formats, the APPn section often
starts with a header signature itself, like JFIF or Exif for JPEG/JFIF or JPEG/Exif,
respectively.
However, there are also "Raw" JPEGs, that do not start with APPn but with the image
data directly.
The only common denominator for all JPEG file formats seem to be the first three
bytes. As apparently, not even the end segment is consistently used.
Magic bytes:
--> FF D8 FF DB
-> .. .. .. ..
--> FF D8 FF E0 (only for JPG and not JPEG, but no need to differentiate)
-> .. .. .. ..
--> FF D8 FF E0 00 10 4A 46 49 46 00 01 (covered be prior)
-> .. .. .. .. .. .. J F I F .. ..
--> FF D8 FF EE
-> .. .. .. ..
--> FF D8 FF E1 ?? ?? 45 78 69 66 00 00
-> .. .. .. .. .. .. E x i f .. ..
--> FF D8 FF
-> .. .. ..
Support: native
"""
return h[:3] == b"\xFF\xD8\xFF" and (
h[3] in [0xDB, 0xE0, 0xEE]
or (h[3] == 0xE1 and h[6:12] == b"\x45\x78\x69\x66\x00\x00")
)
return h[:3] == b"\xFF\xD8\xFF"


def _test_png(h: bytes, _f: BinaryIO) -> bool:
Expand Down

0 comments on commit 4330b0c

Please sign in to comment.