Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add some low level APIs #154

Merged
merged 9 commits into from
Feb 19, 2024
Merged

add some low level APIs #154

merged 9 commits into from
Feb 19, 2024

Conversation

thejoshwolfe
Copy link
Owner

@thejoshwolfe thejoshwolfe commented Feb 17, 2024

  • Added readLocalFileHeader() and Class: LocalFileHeader.
  • Added openReadStreamLowLevel().
  • Added getFileNameLowLevel() and parseExtraFields(). Added fields to Class: Entry: fileNameRaw, extraFieldRaw, fileCommentRaw.
  • Added examples/compareCentralAndLocalHeaders.js that demonstrate many of these low level APIs.
  • Noted dropped support of node versions before 12 in the "engines" field of package.json.
  • Fixed a crash when calling openReadStream() with an explicitly null options parameter (as opposed to omitted).

Here's some of the readme additions copied into this PR for convenience:

getFileNameLowLevel(generalPurposeBitFlag, fileNameBuffer, extraFields, strictFileNames)

If you are setting decodeStrings to false, then this function can be used to decode the file name yourself.
This function is effectively used internally by yauzl to populate the entry.fileName field when decodeStrings is true.

WARNING: This method of getting the file name bypasses the security checks in validateFileName().
You should call that function yourself to be sure to guard against malicious file paths.

generalPurposeBitFlag can be found on an Entry or LocalFileHeader.
Only General Purpose Bit 11 is used, and only when an Info-ZIP Unicode Path Extra Field cannot be found in extraFields.

fileNameBuffer is a Buffer representing the file name field of the entry.
This is entry.fileNameRaw or localFileHeader.fileName.

extraFields is the parsed extra fields array from entry.extraFields or parseExtraFields().

strictFileNames is a boolean, the same as the option of the same name in open().
When false, backslash characters (\) will be replaced with forward slash characters (/).
This function always returns a string, although it may not be a valid file name.
See validateFileName().

parseExtraFields(extraFieldBuffer)

This function is used internally by yauzl to compute entry.extraFields.
It is exported in case you want to call it on localFileHeader.extraField.

extraFieldBuffer is a Buffer, such as localFileHeader.extraField.
Returns an Array with each item in the form {id: id, data: data},
where id is a Number and data is a Buffer.
Throws an Error if the data encodes an item with a size that exceeds the bounds of the buffer.

You may want to surround calls to this function with try { ... } catch (err) { ... } to handle the error.

readLocalFileHeader(entry, [options], callback)

This is a low-level function you probably don't need to call.
The intended use case is either preparing to call openReadStreamLowLevel()
or simply examining the content of the local file header out of curiosity or for debugging zip file structure issues.

entry is an entry obtained from Event: "entry".
An entry in this library is a file's metadata from a Central Directory Header,
and this function gives the corresponding redundant data in a Local File Header.

options may be omitted or null, and has the following defaults:

{
  minimal: false,
}

If minimal is false (or omitted or null), the callback receives a full LocalFileHeader.
If minimal is true, the callback receives an object with a single property and no prototype {fileDataStart: fileDataStart}.
For typical zipfile reading usecases, this field is the only one you need,
and yauzl internally effectively uses the {minimal: true} option as part of openReadStream().

The callback receives (err, localFileHeaderOrAnObjectWithJustOneFieldDependingOnTheMinimalOption),
where the type of the second parameter is described in the above discussion of the minimal option.

openReadStreamLowLevel(fileDataStart, compressedSize, relativeStart, relativeEnd, decompress, uncompressedSize, callback)

This is a low-level function available for advanced use cases. You probably want openReadStream() instead.

The intended use case for this function is calling readEntry() and readLocalFileHeader() with {minimal: true} first,
and then opening the read stream at a later time, possibly after closing and reopening the entire zipfile,
possibly even in a different process.
The parameters are all integers and booleans, which are friendly to serialization.

  • fileDataStart - from localFileHeader.fileDataStart
  • compressedSize - from entry.compressedSize
  • relativeStart - the resolved value of options.start from openReadStream(). Must be a non-negative integer, not null. Typically 0 to start at the beginning of the data.
  • relativeEnd - the resolved value of options.end from openReadStream(). Must be a non-negative integer, not null. Typically entry.compressedSize to include all the data.
  • decompress - boolean indicating whether the data should be piped through a zlib inflate stream.
  • uncompressedSize - from entry.uncompressedSize. Only used when validateEntrySizes is true. If validateEntrySizes is false, this value is ignored, but must still be present, not omitted, in the arguments; you have to give it some value, even if it's null.
  • callback - receives (err, readStream), the same as for openReadStream()

This low-level function does not read any metadata from the underlying storage before opening the read stream.
This is both a performance feature and a safety hazard.
None of the integer parameters are bounds checked.
None of the validation from openReadStream() with respect to compression and encryption is done here either.
Only the bounds checks from validateEntrySizes are done, because that is part of processing the stream data.

Class: LocalFileHeader

This is a trivial class that has no methods and only the following properties.
The constructor is available to call, but it doesn't do anything.
See readLocalFileHeader().

See the zipfile spec for what these fields mean.

  • fileDataStart - Number: inferred from fileNameLength, extraFieldLength, and this struct's position in the zipfile.
  • versionNeededToExtract - Number
  • generalPurposeBitFlag - Number
  • compressionMethod - Number
  • lastModFileTime - Number
  • lastModFileDate - Number
  • crc32 - Number
  • compressedSize - Number
  • uncompressedSize - Number
  • fileNameLength - Number
  • extraFieldLength - Number
  • fileName - Buffer
  • extraField - Buffer

Note that unlike Class: Entry, the fileName and extraField are completely unprocessed.
This notably lacks Unicode and ZIP64 handling as well as any kind of safety validation on the file name.
See also parseExtraFields().

Also note that if your object is missing some of these fields,
make sure to read the docs on the minimal option in readLocalFileHeader().

@thejoshwolfe thejoshwolfe changed the title add readLocalFileHeader() add readLocalFileHeader() and openReadStreamLowLevel() Feb 17, 2024
@thejoshwolfe thejoshwolfe changed the title add readLocalFileHeader() and openReadStreamLowLevel() add more low level APIs Feb 18, 2024
@thejoshwolfe thejoshwolfe changed the title add more low level APIs add some low level APIs Feb 19, 2024
@thejoshwolfe thejoshwolfe marked this pull request as ready for review February 19, 2024 01:11
@thejoshwolfe thejoshwolfe merged commit 47d60a9 into master Feb 19, 2024
5 checks passed
@thejoshwolfe thejoshwolfe deleted the local-file-header branch February 19, 2024 01:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant