Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unified AST #58

Open
pepegar opened this issue Jan 2, 2019 · 0 comments
Open

Unified AST #58

pepegar opened this issue Jan 2, 2019 · 0 comments
Assignees

Comments

@pepegar
Copy link
Member

pepegar commented Jan 2, 2019

introducing UAST

disclaimer: UAST term is borrowed from https://doc.bblf.sh/uast/uast-specification.html


Currently we declare different ASTs for different protocols. We
have AvroF for Avro, ProtobufF for Protocol Buffers, MuF for
describing Mu services...

Table of Contents

the problem

However, we are repeating ourselves a lot. See the implementation of
AvroF:

  ...
  final case class TNull[A]()                  extends AvroF[A]
  final case class TBoolean[A]()               extends AvroF[A]
  final case class TInt[A]()                   extends AvroF[A]
  final case class TLong[A]()                  extends AvroF[A]
  final case class TFloat[A]()                 extends AvroF[A]
  final case class TDouble[A]()                extends AvroF[A]
  final case class TBytes[A]()                 extends AvroF[A]
  final case class TString[A]()                extends AvroF[A]
  final case class TNamedType[A](name: String) extends AvroF[A]
  final case class TArray[A](item: A)          extends AvroF[A]
  final case class TMap[A](values: A)          extends AvroF[A]
  ...

And the implementation of MuF:

  ...
  final case class TNull[A]()                                        extends MuF[A]
  final case class TDouble[A]()                                      extends MuF[A]
  final case class TFloat[A]()                                       extends MuF[A]
  final case class TInt[A]()                                         extends MuF[A]
  final case class TLong[A]()                                        extends MuF[A]
  final case class TBoolean[A]()                                     extends MuF[A]
  final case class TString[A]()                                      extends MuF[A]
  final case class TByteArray[A]()                                   extends MuF[A]
  final case class TNamedType[A](name: String)                       extends MuF[A]
  final case class TOption[A](value: A)                              extends MuF[A]
  final case class TEither[A](left: A, right: A)                     extends MuF[A]
  final case class TList[A](value: A)                                extends MuF[A]
  final case class TMap[A](value: A)                                 extends MuF[A]
  ...

We repeat basically all the leaves of the AST, because all different
IDLs allow to represent more or less the same types, with minor
differences.

Also, having all those different ASTs means that we cannot create
functions that work on generic ASTs, but need to specify them.

The solution

I would prefer using a unique set of classes representing different
types possible in any schema, and then the specific schemata pick and
choose from them in order to assemble their ADT, as follows.

First we would need to declare all our base leaves:

  final case class TNull[A]()
  final case class TBoolean[A]()
  final case class TInt[A]()
  final case class TLong[A]()
  final case class TFloat[A]()
  final case class TDouble[A]()
  final case class TBytes[A]()
  final case class TString[A]()
  final case class TNamedType[A](name: String)
  final case class TArray[A](item: A)          // sugar over Generic(NamedType("Map"), A, A)
  final case class TMap[A](keys: A, values: A) // sugar over Generic(NamedType("Map"), A, A)
  final case class TFixed[A](name: String, size: Int)
  final case class TOption[A](value: A)          // sugar over Generic(NamedType("Option"), A)
  final case class TEither[A](left: A, right: A) // sugar over Generic(NamedType("Option"), A)
  final case class TList[A](value: A)            // sugar over Generic(NamedType("Option"), A)
  final case class TGeneric[A](generic: A, params: List[A])
  final case class TRecord[A](name: String, fields: List[Field[A]])
  final case class TEnum[A](name: String, symbols: List[String])
  final case class TUnion[A](options: NonEmptyList[A])

N.B: these classes do not extend anything, they're plain old case
classes.

Then, for combining those case classes into an Algebraic Data Type, we
can use coproducts. Coproducts and its motivations are well explained
in this paper.

Higher kind coproducts are implemented in several libraries in the
scala FP world, such as Scalaz, Cats, and Iota. In order to do some
dogfooding, we'll be using iota's implementation.

Then, we would assemble our types using coproducts, as follows:

import iota._
import iota.TListK.:::

type MuType[A] = CopK[
  TNull :::
  TDouble :::
  TFloat :::
  TInt :::
  TLong :::
  TDouble :::
  TBoolean :::
  TString :::
  TBytes :::
  TNamedType :::
  TOption :::
  TEither :::
  TList :::
  TGeneric :::
  TArray :::
  TMap :::
  TRecord :::
  TEnum :::
  TUnion :::
  TNilK,
  A]


type AvroType[A] = CopK[
  TNull :::
  TBoolean :::
  TInt :::
  TLong :::
  TFloat :::
  TDouble :::
  TBytes :::
  TString :::
  TNamedType :::
  TArray :::
  TMap :::
  TRecord :::
  TEnum :::
  TUnion :::
  TFixed :::
  TNilK,
  A]

See how, in this case, each AST declaration decided to pick only the
types it needed.

Usage

The only missing part of this idea is how to use it. One of the
features motivating this change is code reuse, and generic
programming. In the following code block you can see how some generic
transformations on these ASTs could be implemented.

  def desugarList[F[α] <: CopK[_, α], A](
      implicit
      L: CopK.Inject[TList, F],
      G: CopK.Inject[TGeneric, F],
      N: CopK.Inject[TNamedType, F],
      A: Embed[F, A]
  ): Trans[F, F, A] =
    Trans {
      case L(TList(t)) => generic[F, A](namedType[F, A]("List").embed, List(t))
      case x           => x
    }

  def desugarOption[F[α] <: CopK[_, α], A](
      implicit
      O: CopK.Inject[TOption, F],
      G: CopK.Inject[TGeneric, F],
      N: CopK.Inject[TNamedType, F],
      A: Embed[F, A]
  ): Trans[F, F, A] =
    Trans {
      case O(TOption(a)) => generic[F, A](namedType[F, A]("Option").embed, List(a))
      case x             => x
    }

  def desugarEither[F[α] <: CopK[_, α], A](
      implicit
      E: CopK.Inject[TEither, F],
      G: CopK.Inject[TGeneric, F],
      N: CopK.Inject[TNamedType, F],
      A: Embed[F, A]
  ): Trans[F, F, A] =
    Trans {
      case E(TEither(a, b)) => generic[F, A](namedType[F, A]("Either").embed, List(a, b))
      case x                => x
    }

As you can see, these functions do not work on a specific AST.
Instead, they are generic, meaning that they will work on any AST for
which some injections exist.

work to be done

We need to implement a way of traverse derivation for Iota's
coproducts. Traverse can be derived mechanically, and I think the
fastest way would be a generic-like derivation a la shapeless.

resources

some useful resources for understanding this approach are this
talk
and this paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants