Skip to content

Latest commit

 

History

History
105 lines (86 loc) · 5.13 KB

README.md

File metadata and controls

105 lines (86 loc) · 5.13 KB

qbinparse

Customizable binary data parser

The goal is to be able to parse various binary formats (e.g. network protocols and data files) in a flexible rapid-development compatible way.

Build

Download k.h and libq.a from Kx website.

An example build script is in b.cmd. You might have to fix the -I and -L parameters so they point to the directories of k.h and libq.a.

Loading

Requires KDB 3.5 for the enhanced lambda metadata

\l path/to/qbinparse/qbinparse.q

Usage

Schema format

The parser uses a simple language to describe the schema of the data.

The schema is a series of record definitions in the form record fields end.

Each field definition is in the form field name type.

The possible types are:

  • byte, char, short, int, long, real, float: same meaning as in q
  • uint, ushort: unsigned value that is represented in q with the next bigger integer type (an ushort is returned as an int and an uint is returned as a long). There is no ulong because there is no integer type in q that is able to represent all of its values.
  • dotnetVarLengthInt: an integer that is stored in a way compatible with .NET's variable-length integer serialization format (each byte encodes 7 bits of the original integer, with the most significant bit indicating that there are more bytes to follow)
  • record recordName: a nested record
  • array elementType size: an array
    • elementType can be the atomic types or record, currently multi-dimensional array is not supported
    • size can be specified as:
      • x number: constant length
      • xv fieldName: length is the value of the specified field
      • xz: zero-terminated string
      • tpb number: array has a guard byte with the value number after it
      • tps number: array has a guard short with the value number after it
      • tpi number: array has a guard int with the value number after it
      • repeat: array extends up to the end of the available input - this should be the last element in the main record or used in a parsedArray
  • parsedArray size elementType: an array with internal structure. The size specifies the number of bytes the array takes up, then the parsing process is recursively called on the array. elementType must be a full field type, typically an array or record. If a regular array is used within a parsedArray, it will have its own size, which can be repeat to make it cover the entire parsedArray.
  • case fieldName val1 rec1 val2 rec2 ... [default recD]: a variable-type field that is parsed as one of the specified records based on the value of the tag field. valN are either integers or four-character strings. An optional default case can be added that covers values not listed in the cases.

In addition the type may be preceded by an operator. The following operator is supported:

  • recSize: the field contains the record size, during parsing the "end of record" (for determining which fields run past the end of the input and how much data a repeat field can consume) is set according to this size.

Parsing

First compile the schema:

schema:.binp.compileSchema schemaStr;

Then use the compiled schema on the data:

.binp.parse[schema;0x0000;`mainType]

Serialization (unparsing)

This is the inverse operation of .binp.parse:

.binp.unparse[schema;`a`b!1 2;`mainType]

Examples

An array of 4 ints:

schemaStr:"
    record simple
        field nums array int x 4
    end";
schema:.binp.compileSchema schemaStr;
.binp.parse[schema;0x01000000020000000300000004000000;`simple]

Returns: enlist[`nums]!enlist 1 2 3 4i

The inverse operation:

q).binp.unparse[schema;enlist[`nums]!enlist 1 2 3 4;`simple]
0x01000000020000000300000004000000

A string with a two-byte length prepended:

schemaStr:"
    record stringWithShortLen
        field length short
        field str array char xv length
    end";
schema:.binp.compileSchema schemaStr;
.binp.parse[schema;0x060048656c6c6f;`stringWithShortLen]

Returns: `length`str!(6h;"Hello")

See also examples/parse.q for parsing and examples/unparse.q for unparsing.

Error handling

Parsing failures don't throw errors but instead return a partial object with the error inserted into the value of the problematic field as a symbol. Possible errors include:

  • endOfBuffer: attempt to read a field when the read position is already at the end of the input
  • arrayRunsPastInput: an array has a size that would make it cover data past the end of the input
  • tooLargeArray: the array size wouldn't fit into 32 bits
  • noCaseMatch: a case field encountered an input value that is not among the cases and there is no default case

Furthermore if there are extra bytes left over after parsing the main record, the leftover bytes are added to the record with a field named xxxRemainingData. This is also considered a type of error and in particular the .binp.unparse function will ignore this field. To describe a format that allows garbage/padding/irrelevant data at the end, use an array byte repeat field as the last field to capture all the remaining bytes.