Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read simple glyph flags #392

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 57 additions & 8 deletions doc/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ elaboration, and core language is forthcoming.
- [Overlap formats](#overlap-formats)
- [Number formats](#number-formats)
- [Array formats](#array-formats)
- [Map formats](#map-formats)
- [Repeat formats](#repeat-formats)
- [Limit formats](#limit-formats)
- [Stream position formats](#stream-position-formats)
Expand Down Expand Up @@ -505,22 +506,62 @@ of the host [array types](#array-types).
| `array32 len format` | `Array32 len (Repr format)` |
| `array64 len format` | `Array64 len (Repr format)` |

### Map formats

There are four array map formats, corresponding to the four [array types](#arrays).
These allow mapping a supplied function over the elements of an array in order
to parse another array:

- `array8_map : fun (len : U8) -> fun (A : Type) -> (A -> Format) -> Array8 len A -> Format`
- `array16_map : fun (len : U16) -> fun (A : Type) -> (A -> Format) -> Array16 len A -> Format`
- `array32_map : fun (len : U32) -> fun (A : Type) -> (A -> Format) -> Array32 len A -> Format`
- `array64_map : fun (len : U64) -> fun (A : Type) -> (A -> Format) -> Array64 len A -> Format`

#### Representation of map formats

The [representation](#format-representations) of the array map formats preserve the
lengths, and use the representation of the map function as the element types
of the host [array types](#array-types).

| format | `Repr` format |
|----------------------------------|-----------------------------|
| `array8_map len A map_fn array` | `Array8 len (Repr map_fn)` |
| `array16_map len A map_fn array` | `Array16 len (Repr map_fn)` |
| `array32_map len A map_fn array` | `Array32 len (Repr map_fn)` |
| `array64_map len A map_fn array` | `Array64 len (Repr map_fn)` |

Comment on lines +509 to +532
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks suspicious. Is the idea that each element of the array can produce a different format? If so this isn’t really represented in the definition of Repr... and we don’t have a heterogeneous array type to properly model this (where the types of the elements could be different).

Relatedly, there‘s also a type error in Repr map_fn. The type signature of map_fn is A -> Format, but Repr expects an argument of Format. Sorry if I didn’t catch this!

Alas I don’t have any ideas that come to mind as yet, other than constrained formats to constrain the array*_map formats, and map format to let you map each element format to ensure they have a common representation:

array8_map : fun (len : U8) (A : Type) (B : Type) -> (A -> Format [Repr = B]) -> Array8 len A -> Format

Repr (array8_map len A B map_fn array) = Array8 len B
map : fun (f : Format) (B : Type) -> (Repr f -> B) -> Format

Repr (map f B fn) = B

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like map is something that happens to parsed arrays to produce other parsed arrays, rather than being a format itself 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea that each element of the array can produce a different format?

To be honest I'm not sure. My grasp on this whole thing is tenuous at best—each time I think I have a handle on it is escapes my grasp. In the motivating example the key thing is this function:

let read_coord = fun (is_short : Repr flag -> Bool) => fun (is_same : Repr flag -> Bool) => fun (f : Repr flag) => {
    coord <- match (is_short f) {
        true => u8,
        false => match (is_same f) {
            true => succeed S16 0,
            false => s16be,
        }
    }
},

In order to read a coordinate you need to take the corresponding flag for that coordinate and based on the bits that are set in it you will read 0, 1, or 2 bytes from the input. In my perhaps broken mental model this was the same format, the same function is used to read all values.

Copy link
Member

@brendanzab brendanzab Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like map is something that happens to parsed arrays to produce other parsed arrays, rather than being a format itself 🤔

Yeah the the naming is probably a bit off at any rate - I think this is more like… traverse from Haskell land? I think?

traverse : (Traversable t, Applicative f) => (a -> f b) -> t a -> f (t b)
traverse : fun {len} {A, B} -> (A -> FormatOf B) -> Array len A -> FormatOf B

where:

  • f is FormatOf
  • t is Array len

### Repeat formats

The `repeat_until_end` format repeats parsing the given format until the end of
the current binary stream is reached:

- `repeat_until_end : Format -> Format`

There are four repeat until full formats that parse and replicate a format until
a given length is reached:

- `repeat_until_full8 : U8 -> fun (A : Format) -> (Repr A -> U8) -> Format`
- `repeat_until_full16 : U16 -> fun (A : Format) -> (Repr A -> U16) -> Format`
- `repeat_until_full32 : U32 -> fun (A : Format) -> (Repr A -> U32) -> Format`
- `repeat_until_full64 : U64 -> fun (A : Format) -> (Repr A -> U64) -> Format`

The supplied function can be used to replicate a single parsed format multiple
times into the final array.

#### Representation of repeat formats

Because the repeat format does not have a predefined length, it is
Because the repeat until end format does not have a predefined length, it is
[represented](#format-representations) as a dynamically sized
[array type](#array-types):
[array type](#array-types). The repeat until full format preserves the length
in its [representation](#format-representations):

| format | `Repr` format |
| ------------------------- | --------------------- |
| `repeat_until_end format` | `Array (Repr format)` |
| format | `Repr` format |
|-------------------------------------|-----------------------------|
| `repeat_until_end format` | `Array (Repr format)` |
| `repeat_until_full8 len replicate` | `Array8 len (Repr format)` |
| `repeat_until_full16 len replicate` | `Array16 len (Repr format)` |
| `repeat_until_full32 len replicate` | `Array32 len (Repr format)` |
| `repeat_until_full64 len replicate` | `Array64 len (Repr format)` |

### Limit formats

Expand Down Expand Up @@ -595,11 +636,18 @@ embedded in the resulting parsed output.

- `succeed : fun (A : Type) -> A -> Format`

The `or_succeed` accepts a boolean condition value. If the condition value is
`true` then the format is used, otherwise the default value is used, consuming
no input:

- `or_succeed : Bool -> fun (A : Format) -> Repr A -> Format`

#### Representation of succeed formats

| format | `Repr` format |
| ------------- | ------------- |
| `succeed A a` | `A` |
| format | `Repr` format |
|----------------------------------|---------------|
| `succeed A a` | `A` |
| `or_succeed cond format default` | `Repr format` |

### Fail format

Expand Down Expand Up @@ -845,6 +893,7 @@ A number of operations are defined for the numeric types:
- `u16_and : U16 -> U16 -> U16`
- `u16_or : U16 -> U16 -> U16`
- `u16_xor : U16 -> U16 -> U16`
- `u16_from_u8 : U8 -> U16`

- `u32_eq : U32 -> U32 -> Bool`
- `u32_neq : U32 -> U32 -> Bool`
Expand Down
27 changes: 27 additions & 0 deletions fathom/src/core.rs
Original file line number Diff line number Diff line change
Expand Up @@ -371,8 +371,32 @@ def_prims! {
FormatArray32 => "array32",
/// Array formats, with unsigned 64-bit indices.
FormatArray64 => "array64",
/// Map a function over an Array8
FormatArray8Map => "array8_map",
/// Map a function over an Array16
FormatArray16Map => "array16_map",
/// Map a function over an Array32
FormatArray32Map => "array32_map",
/// Map a function over an Array64
FormatArray64Map => "array64_map",
/// Repeat a format until the length of the given parse scope is reached.
FormatRepeatUntilEnd => "repeat_until_end",
/// Repeat a format until the array with 8-bit indices is filled.
///
/// The value read by the format is replicated according to a supplied function.
FormatRepeatUntilFull8 => "repeat_until_full8",
/// Repeat a format until the array with 16-bit indices is filled.
///
/// The value read by the format is replicated according to a supplied function.
FormatRepeatUntilFull16 => "repeat_until_full16",
/// Repeat a format until the array with 32-bit indices is filled.
///
/// The value read by the format is replicated according to a supplied function.
FormatRepeatUntilFull32 => "repeat_until_full32",
/// Repeat a format until the array with 64-bit indices is filled.
///
/// The value read by the format is replicated according to a supplied function.
FormatRepeatUntilFull64 => "repeat_until_full64",
/// Limit the format to an unsigned 8-bit byte length.
FormatLimit8 => "limit8",
/// Limit the format to an unsigned 16-bit byte length.
Expand All @@ -390,6 +414,8 @@ def_prims! {
FormatDeref => "deref",
/// A format that always succeeds with some data.
FormatSucceed => "succeed",
/// A format that always succeeds with a default value if a supplied condition is false.
FormatOrSucceed => "or_succeed",
/// A format that always fails to parse.
FormatFail => "fail",
/// Unwrap an option, or fail to parse.
Expand Down Expand Up @@ -440,6 +466,7 @@ def_prims! {
U16And => "u16_and",
U16Or => "u16_or",
U16Xor => "u16_xor",
U16FromU8 => "u16_from_u8",

U32Eq => "u32_eq",
U32Neq => "u32_neq",
Expand Down
103 changes: 97 additions & 6 deletions fathom/src/core/binary.rs
Original file line number Diff line number Diff line change
Expand Up @@ -411,19 +411,28 @@ impl<'arena, 'env, 'data> Context<'arena, 'env, 'data> {
(Prim::FormatF32Le, []) => read_const(reader, span, read_f32le, Const::F32),
(Prim::FormatF64Be, []) => read_const(reader, span, read_f64be, Const::F64),
(Prim::FormatF64Le, []) => read_const(reader, span, read_f64le, Const::F64),
(Prim::FormatArray8, [FunApp(len), FunApp(format)]) => self.read_array(reader, span, len, format),
(Prim::FormatArray16, [FunApp(len), FunApp(format)]) => self.read_array(reader, span, len, format),
(Prim::FormatArray32, [FunApp(len), FunApp(format)]) => self.read_array(reader, span, len, format),
(Prim::FormatArray8, [FunApp(len), FunApp(format)]) |
(Prim::FormatArray16, [FunApp(len), FunApp(format)]) |
(Prim::FormatArray32, [FunApp(len), FunApp(format)]) |
(Prim::FormatArray64, [FunApp(len), FunApp(format)]) => self.read_array(reader, span, len, format),
(Prim::FormatArray8Map, [_, _, FunApp(map_fn), FunApp(array)]) |
(Prim::FormatArray16Map, [_, _, FunApp(map_fn), FunApp(array)]) |
(Prim::FormatArray32Map, [_, _, FunApp(map_fn), FunApp(array)]) |
(Prim::FormatArray64Map, [_, _, FunApp(map_fn), FunApp(array)]) => self.array_map(reader, span, map_fn, array),
(Prim::FormatRepeatUntilEnd, [FunApp(format)]) => self.read_repeat_until_end(reader, format),
(Prim::FormatLimit8, [FunApp(limit), FunApp(format)]) => self.read_limit(reader, limit, format),
(Prim::FormatLimit16, [FunApp(limit), FunApp(format)]) => self.read_limit(reader, limit, format),
(Prim::FormatLimit32, [FunApp(limit), FunApp(format)]) => self.read_limit(reader, limit, format),
(Prim::FormatRepeatUntilFull8, [FunApp(len), FunApp(format), FunApp(replicate)]) |
(Prim::FormatRepeatUntilFull16, [FunApp(len), FunApp(format), FunApp(replicate)]) |
(Prim::FormatRepeatUntilFull32, [FunApp(len), FunApp(format), FunApp(replicate)]) |
(Prim::FormatRepeatUntilFull64, [FunApp(len), FunApp(format), FunApp(replicate)]) => self.read_repeat_until_full(reader, len, replicate, format),
(Prim::FormatLimit8, [FunApp(limit), FunApp(format)]) |
(Prim::FormatLimit16, [FunApp(limit), FunApp(format)]) |
(Prim::FormatLimit32, [FunApp(limit), FunApp(format)]) |
(Prim::FormatLimit64, [FunApp(limit), FunApp(format)]) => self.read_limit(reader, limit, format),
(Prim::FormatLink, [FunApp(pos), FunApp(format)]) => self.read_link(span, pos, format),
(Prim::FormatDeref, [FunApp(format), FunApp(r#ref)]) => self.read_deref(format, r#ref),
(Prim::FormatStreamPos, []) => read_stream_pos(reader, span),
(Prim::FormatSucceed, [_, FunApp(elem)]) => Ok(elem.clone()),
(Prim::FormatOrSucceed, [FunApp(cond), FunApp(format), FunApp(default)]) => self.read_or_succeed(reader, cond, format, default),
(Prim::FormatFail, []) => Err(ReadError::ReadFailFormat(span)),
(Prim::FormatUnwrap, [_, FunApp(option)]) => match option.match_prim_spine() {
Some((Prim::OptionSome, [FunApp(elem)])) => Ok(elem.clone()),
Expand Down Expand Up @@ -456,6 +465,30 @@ impl<'arena, 'env, 'data> Context<'arena, 'env, 'data> {
Ok(Spanned::new(span, Arc::new(Value::ArrayLit(elem_exprs))))
}

fn array_map(
&mut self,
reader: &mut BufferReader<'data>,
span: Span,
map_fn: &ArcValue<'arena>,
array: &ArcValue<'arena>,
) -> Result<ArcValue<'arena>, ReadError<'arena>> {
let array = self.elim_env().force(array);
let array = match array.as_ref() {
Value::ArrayLit(ary) => ary,
_ => return Err(ReadError::InvalidValue(array.span())),
};

let elem_exprs = array
.iter()
.map(|elem| {
let elem_format = self.elim_env().fun_app(map_fn.clone(), elem.clone());
self.read_format(reader, &elem_format)
})
.collect::<Result<_, _>>()?;

Ok(Spanned::new(span, Arc::new(Value::ArrayLit(elem_exprs))))
}

fn read_repeat_until_end(
&mut self,
reader: &mut BufferReader<'data>,
Expand Down Expand Up @@ -484,6 +517,50 @@ impl<'arena, 'env, 'data> Context<'arena, 'env, 'data> {
}
}

fn read_repeat_until_full(
&mut self,
reader: &mut BufferReader<'data>,
len: &ArcValue<'arena>,
replicate: &ArcValue<'arena>,
elem_format: &ArcValue<'arena>,
) -> Result<ArcValue<'arena>, ReadError<'arena>> {
let len = match self.elim_env().force(len).as_ref() {
Value::ConstLit(Const::U8(len, _)) => Some(usize::from(*len)),
Value::ConstLit(Const::U16(len, _)) => Some(usize::from(*len)),
Value::ConstLit(Const::U32(len, _)) => usize::try_from(*len).ok(),
Value::ConstLit(Const::U64(len, _)) => usize::try_from(*len).ok(),
_ => return Err(ReadError::InvalidValue(len.span())),
}
.ok_or_else(|| ReadError::InvalidValue(len.span()))?;
let replicate = self.elim_env().force(replicate);

let mut elems = Vec::with_capacity(len);
while elems.len() < len {
match self.read_format(reader, elem_format) {
Ok(elem) => {
// Call the function to determine how many items this represents
let closure_res = self.elim_env().fun_app(replicate.clone(), elem.clone());
let repeat = match closure_res.as_ref() {
Value::ConstLit(Const::U16(n, _)) => *n,
_ => return Err(ReadError::InvalidValue(replicate.span())),
};

// Push it that many times onto the array, limiting to the length of the
// output array.
elems.extend(
std::iter::repeat(elem).take(usize::from(repeat).min(len - elems.len())),
);
}
Err(err) => return Err(err),
};
}

Ok(Spanned::new(
elem_format.span(),
Arc::new(Value::ArrayLit(elems)),
))
}

fn read_limit(
&mut self,
reader: &BufferReader<'data>,
Expand Down Expand Up @@ -540,6 +617,20 @@ impl<'arena, 'env, 'data> Context<'arena, 'env, 'data> {
self.lookup_or_read_ref(pos, format)
}

fn read_or_succeed(
&mut self,
reader: &mut BufferReader<'data>,
cond: &ArcValue<'arena>,
format: &ArcValue<'arena>,
default: &ArcValue<'arena>,
) -> Result<ArcValue<'arena>, ReadError<'arena>> {
match cond.as_ref() {
Value::ConstLit(Const::Bool(true)) => self.read_format(reader, format),
Value::ConstLit(Const::Bool(false)) => Ok(default.clone()),
_ => Err(ReadError::InvalidValue(Span::Empty)),
}
}

fn lookup_ref<'context>(
&'context self,
pos: usize,
Expand Down
28 changes: 28 additions & 0 deletions fathom/src/core/semantics.rs
Original file line number Diff line number Diff line change
Expand Up @@ -475,6 +475,7 @@ fn prim_step(prim: Prim) -> PrimStep {
Prim::U16And => const_step!([x, xst: U16, y, yst: U16] => Const::U16(u16::bitand(*x, *y), UIntStyle::merge(*xst, *yst))),
Prim::U16Or => const_step!([x, xst: U16, y, yst: U16] => Const::U16(u16::bitor(*x, *y), UIntStyle::merge(*xst, *yst))),
Prim::U16Xor => const_step!([x, xst: U16, y, yst: U16] => Const::U16(u16::bitxor(*x, *y), UIntStyle::merge(*xst, *yst))),
Prim::U16FromU8 => const_step!([x, xst: U8] => Const::U16(u16::from(*x), *xst)),

Prim::U32Eq => const_step!([x: U32, y: U32] => Const::Bool(x == y)),
Prim::U32Neq => const_step!([x: U32, y: U32] => Const::Bool(x != y)),
Expand Down Expand Up @@ -847,19 +848,46 @@ impl<'arena, 'env> ElimEnv<'arena, 'env> {
(Prim::FormatArray64, [Elim::FunApp(len), Elim::FunApp(elem)]) => {
Value::prim(Prim::Array64Type, [len.clone(), self.format_repr(elem)])
}
(Prim::FormatArray8Map, [Elim::FunApp(len), _, Elim::FunApp(map_fn), _]) => {
Value::prim(Prim::Array8Type, [len.clone(), self.format_repr(map_fn)])
}
(Prim::FormatArray16Map, [Elim::FunApp(len), _, Elim::FunApp(map_fn), _]) => {
Value::prim(Prim::Array16Type, [len.clone(), self.format_repr(map_fn)])
}
(Prim::FormatArray32Map, [Elim::FunApp(len), _, Elim::FunApp(map_fn), _]) => {
Value::prim(Prim::Array32Type, [len.clone(), self.format_repr(map_fn)])
}
(Prim::FormatArray64Map, [Elim::FunApp(len), _, Elim::FunApp(map_fn), _]) => {
Value::prim(Prim::Array64Type, [len.clone(), self.format_repr(map_fn)])
}
(Prim::FormatLimit8, [_, Elim::FunApp(elem)]) => return self.format_repr(elem),
(Prim::FormatLimit16, [_, Elim::FunApp(elem)]) => return self.format_repr(elem),
(Prim::FormatLimit32, [_, Elim::FunApp(elem)]) => return self.format_repr(elem),
(Prim::FormatLimit64, [_, Elim::FunApp(elem)]) => return self.format_repr(elem),
(Prim::FormatRepeatUntilEnd, [Elim::FunApp(elem)]) => {
Value::prim(Prim::ArrayType, [self.format_repr(elem)])
}
(Prim::FormatRepeatUntilFull8, [Elim::FunApp(len), Elim::FunApp(elem), _]) => {
Value::prim(Prim::Array8Type, [len.clone(), self.format_repr(elem)])
}
(Prim::FormatRepeatUntilFull16, [Elim::FunApp(len), Elim::FunApp(elem), _]) => {
Value::prim(Prim::Array16Type, [len.clone(), self.format_repr(elem)])
}
(Prim::FormatRepeatUntilFull32, [Elim::FunApp(len), Elim::FunApp(elem), _]) => {
Value::prim(Prim::Array32Type, [len.clone(), self.format_repr(elem)])
}
(Prim::FormatRepeatUntilFull64, [Elim::FunApp(len), Elim::FunApp(elem), _]) => {
Value::prim(Prim::Array64Type, [len.clone(), self.format_repr(elem)])
}
(Prim::FormatLink, [_, Elim::FunApp(elem)]) => {
Value::prim(Prim::RefType, [elem.clone()])
}
(Prim::FormatDeref, [Elim::FunApp(elem), _]) => return self.format_repr(elem),
(Prim::FormatStreamPos, []) => Value::prim(Prim::PosType, []),
(Prim::FormatSucceed, [Elim::FunApp(elem), _]) => return elem.clone(),
(Prim::FormatOrSucceed, [_, Elim::FunApp(elem), _]) => {
return self.format_repr(elem)
}
(Prim::FormatFail, []) => Value::prim(Prim::VoidType, []),
(Prim::FormatUnwrap, [Elim::FunApp(elem), _]) => return elem.clone(),
(Prim::ReportedError, []) => Value::prim(Prim::ReportedError, []),
Expand Down
Loading