Skip to content

Commit

Permalink
feat(handlers): check header checksum in tar handler
Browse files Browse the repository at this point in the history
The unix v7 old-style tar handler's pattern is not strict enough to
prevent false positives, so checking the checksum might prevent these
false matches.

The header chksum is an octal representation of the sum of header bytes
as (unsigned) integers (the chksum field is calculated with 8 spaces),
followed by a null and a space (there are tar files with these bytes
reversed).

Multiple header checksums are calculated, as the old header is much
shorter, than the newer headers.
Wikipedia also mentions some historic implementations using signed sums.
The potential match is discarded if the header checksum is not one of
the calculated checksums.
  • Loading branch information
e3krisztian authored and qkaiser committed Oct 9, 2023
1 parent 5f8997e commit 43c2d34
Showing 1 changed file with 30 additions and 0 deletions.
30 changes: 30 additions & 0 deletions unblob/handlers/archive/tar.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,36 @@ def calculate_chunk(self, file: File, start_offset: int) -> Optional[ValidChunk]
header_size = snull(header.size)
decode_int(header_size, 8)

def signed_sum(octets) -> int:
return sum(b if b < 128 else 256 - b for b in octets)

if header.chksum[6:8] not in (b"\x00 ", b" \x00"):
logger.error(
"Tar handler: invalid checksum format",
actual_last_2_bytes=header.chksum[6:8],
)
return None
checksum = decode_int(header.chksum[:6], 8)
header_bytes_for_checksum = (
file[start_offset : start_offset + 148]
+ b" " * 8 # chksum field is replaced with "blanks"
+ file[start_offset + 156 : start_offset + 257]
)
extended_header_bytes = file[start_offset + 257 : start_offset + 500]
calculated_checksum_unsigned = sum(header_bytes_for_checksum)
calculated_checksum_signed = signed_sum(header_bytes_for_checksum)
checksums = (
calculated_checksum_unsigned,
calculated_checksum_unsigned + sum(extended_header_bytes),
# signed is of historical interest, calculating for the extended header is not needed
calculated_checksum_signed,
)
if checksum not in checksums:
logger.error(
"Tar header checksum mismatch", expected=str(checksum), actual=checksums
)
return None

end_offset = _get_tar_end_offset(file, start_offset)
if end_offset == -1:
return None
Expand Down

0 comments on commit 43c2d34

Please sign in to comment.