Skip to content

rvanheest/spickle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

spickle

Build Status Codacy Badge Codacy Badge Maven Central

spickle is a library for serializing, parsing and pickling data in Scala.

Often, parsers and serializers are written separate from each other. This can quickly result in updating the one, while forgetting the other. When no good (automated) testing is put in place on beforehand, this may ultimately lead to bugs in production code.

This library offers support for implementing both parsers and serializers separately, but also combines the two into one API, such that you get both parser and serializer with only one peace of code (also known as a pickle in research).

Complex parsers, serializers and picklers are constructed by starting with simple ones and combining them using the many operators defined in this library. Simple building blocks are provided to start off with. This library currently supports these building blocks for parsing, serializing and pickling Strings and XML structures.

Binaries

Binaries and dependency information for Maven, Ivy, Gradle and others can be found at http://search.maven.org.

Example for Gradle

compile 'com.github.rvanheest:spickle_2.12:x.y.z'

and for Maven:

<dependency>
    <groupId>com.github.rvanheest</groupId>
    <artifactId>spickle_2.12</artifactId>
    <version>x.y.z</version>
</dependency>

and for Ivy:

<dependency org="com.github.rvanheest" name="spickle_2.12" rev="x.y.z" />

and for SBT:

libraryDependencies += "com.github.rvanheest" %% "spickle" % "x.y.z"

Build

To build:

    $ git clone [email protected]:rvanheest/spickle.git
    $ cd spickle/
    $ mvn clean install

Bugs and Feedback

For bugs, questions and discussion please use the GitHub Issues.

License

spickle is available under the Apache 2 License. Please see the license for more information.

Examples

Parse XML

To parse an XML structure like

<person>
    <name>Jim Jones</name>
    <age>36</age>
    <favoriteNumber>1</favoriteNumber>
    <favoriteNumber>3</favoriteNumber>
    <favoriteNumber>5</favoriteNumber>
    <favoriteNumber>7</favoriteNumber>
    <favoriteNumber>11</favoriteNumber>
</person>

we define a Parser like:

import com.github.rvanheest.spickle.parser.xml.XmlParser._
import scala.util.Success

case class Person(name: String, age: Int, favoriteNumbers: Seq[Int])

def parsePerson: XmlParser[Person] = {
  branchNode("person") {
    for {
      name <- stringNode("name")
      age <- stringNode("age").toInt
      favoriteNumbers <- stringNode("favoriteNumber").toInt.many
    } yield Person(name, age, favoriteNumbers)
  }
}

val (Success(person), remainingXml) = parsePerson.parse(xml)
// person: Person("Jim Jones", 36, Seq(1, 3, 5, 7, 11))
// remainingXml should be empty

// to only get the parsed object
val Success(person) = parsePerson.eval(xml)

// to only get the remaining xml
val remainingXml = parsePerson.execute(xml)

Parse and serialize XML

When we also require the serializer, such that the object gets written back to XML, the Pickle can be used instead. We define:

import com.github.rvanheest.spickle.pickle.xml.XmlPickle._
import scala.util.Success

case class Person(name: String, age: Int, favoriteNumbers: Seq[Int])

def picklePerson: XmlPickle[Person] = {
  branchNode("person") {
    for {
      name <- stringNode("name").seq[Person](_.name)
      age <- stringNode("age").toInt.seq[Person](_.age)
      favoriteNumbers <- stringNode("favoriteNumber").toInt.many.seq[Person](_.favoriteNumbers)
    } yield Person(name, age, favoriteNumbers)
  }
}

val (Success(person), remainingXml) = picklePerson.parse(xml)
// person: Person("Jim Jones", 36, Seq(1, 3, 5, 7, 11))
// remainingXml should be empty

val Success(Seq(serializedPersonXml)) = picklePerson.serialize(person, Seq.empty)
// serializedPersonXml is equal to the original xml that we parsed

Notice that this Pickle[Person] looks almost the same as the Parser[Person] that was defined in previous example. The only difference is the .seq[Person](...), which is the part that is used by the serializer to access the particular fields in the Person object. Because stringNode("age").toInt and stringNode("favoriteNumber").toInt.many are already used by the serializer, there are no further transformations required in .seq[Person](...) than providing the field accessors.

More XML examples

For more examples on XML parsing, including attributes, namespaces, <xs:all>, etc. checkout the examples project.

String parsing

In the example projects, an ExpressionParser is defined, which takes an arithmetic expression as an input (like "(2 * 3) + (2 + 7)") and evaluates this expression (in the example, it returns 15). Note that the syntax is equivalent to that of the XML parser above, but with atomic parsers on Strings.

Notice, however, that here the structure of the original expression is not preserved and hence it is impossible to define a serializer or pickler for this example.

Technical motivation

As is well known, parsers can be expressed as a function S -> (T, S), taking a state as an input and transforming that to an object T and a remaining state. Parser combinators can then be defined to combine these functions and construct complexer parsers from them. Since a parser is a monad, composition can be done using the familiar map and flatMap operators. Therefore we can use Scala's for-comprehension to neatly define our parsers. From the monadic operators we can also more complex operators such as maybe, many, atLeastOnce, takeWhile, etc. can be constructed.

Taking the dual of the function above, results in a function ((T, S) -> S) that takes both an object T and a current state, and converts the T into a new state S together with the current state. This is typically called a serializer. This type does not define a monad, but can be viewed as a contravariant functor, and also as a monoid. Hence we can define operators like contramap and combine to construct complex serializers such as maybe, many, atLeastOnce, takeWhile, etc.

Combining the parser and serializer together results in a pickler. Just like before, we aim to compose complex picklers out of simpler ones, which in turn are composed out of simple parsers and serializers. With a little extra work to get over the covariant functor from the serializer, we can define the monadic operators for a pickler as well, leading once again to more complex operators that are defined out of the box.

Next to combinators, this library defines specialized parsers, serializers and picklers for commonly used datastructures. String parsing can be quite cumbersome and writing an interpreter for a string if often not a trivial task. Also parsing and serializing XML can cost a lot of time and effort, especially to ensure that serialize(parse(xml)) == xml. Especially when parsers and serializers are written separately (for example using Scala's XML library), it isn't hard to write code where serialize(parse(xml)) != xml. Because of these reasons, spickle defines basic building blocks for both Strings and XML. With these, you can easily write string parsers and interpreters, as well as write simple parsers, serializers and picklers for XML structures.

FAQ

What does spickle stand for?

spickle contains the word 'pickle', which is what this library is all about. Since the library targets Scala as programming language, this became spickle.

Why do we need a library like spickle? Isn't scala-xml good enough? Can't we do String parsing with regular expressions?

This library grew out of a frustration I had while converting a large (more than 5000 lines) XSD to Scala objects and writing a parser using scala-xml. I found that in some cases, the xpath syntax did not suffice and that I had to do magic in order to get the XML parsed correctly. The thought of also having to write a serializer and maintain the XSD, the object structure, the parser and the serializer made me question whether there was a better way of doing this. That's when I started looking into picklers and in particular how they are implemented in other functional languages like Haskell. While the basic idea was quite nice, I thought I could do better. Hence I set out to implement a functional parser and serializer and combine the two into a pickler. While studying parser combinators, I found that the same concept could also be used for String parsing, and many more applications.

Why only specialize for XML and String? Why not have specializations for JSON as well, for example?

The main reason why spickle only supports XML and String is because that's what I set out with. I simply haven't got any time yet to add other data types. Besides, the reason I haven't added JSON as a third specialization (yet) is that Scala's de facto standard for JSON parsing/serialization (json4s) basically already defines a pickler: once you have your object model conform to the JSON, you can use read and write to convert between object model and JSON.

I have another common data type. Can I use spickle for that?

Sure! I would start with the parser, after which the serializer and pickler are relatively simple to write. Think of atomic units in your data type and write a parser and serializer for those first. Then use these atomics to construct complexer units of your data type, until you have parsers for the whole data type. Look at the examples for strings and XML for inspiration.

Besides, always feel free to contribute to spickle! If you think other people might benefit from your picklers as well, send a pull request.