Skip to content

qri-io/startf

Qri GoDoc License Codecov CI Go Report Card

Qri Starlark Transformation Syntax

Qri ("query") is about datasets. Transformations are repeatable scripts for generating a dataset. Starlark is a scripting language from Google that feels a lot like python. This package implements starlark as a transformation syntax. Starlark tranformations are about as close as one can get to the full power of a programming language as a transformation syntax. Often you need this degree of control to generate a dataset.

Typical examples of a starlark transformation include:

  • combining paginated calls to an API into a single dataset
  • downloading unstructured structured data from the internet to extract
  • pulling raw data off the web & turning it into a datset

We're excited about starlark for a few reasons:

  • python syntax - many people working in data science these days write python, we like that, starlark likes that. dope.
  • deterministic subset of python - unlike python, starlark removes properties that reduce introspection into code behaviour. things like while loops and recursive functions are omitted, making it possible for qri to infer how a given transformation will behave.
  • parallel execution - thanks to this deterministic requirement (and lack of global interpreter lock) starlark functions can be executed in parallel. Combined with peer-2-peer networking, we're hoping to advance tranformations toward peer-driven distribed computing. More on that in the coming months.

Getting started

If you're mainly interested in learning how to write starlark transformations, our documentation is a better place to start. If you're interested in contributing to the way starlark transformations work, this is the place!

The easiest way to see starlark transformations in action is to use qri. This startf package powers all the starlark stuff in qri. Assuming you have the go programming language the following should work from a terminal:

# get this package
$ go get github.com/qri-io/startf

# navigate to package
$ cd $GOPATH/src/github.com/qri-io/startf

# run tests

$ go test ./...


Often the next steps are to install [qri](https://github.com/qri-io/qri), mess with this `startf` package, then rebuild qri with your changes to see them in action within qri itself.

## Starlark Special Functions

_Special Functions_ are the core of a starlark transform script. Here's an example of a simple data function that sets the body of a dataset to a constant:

```python
def transform(ds,ctx):
  ds.set_meta(["hello","world"])

Here's something slightly more complicated (but still very contrived) that modifies a dataset by adding up the length of all of the elements in a dataset body

def transform(ds, ctx):
  body = ds.get_body()
  if body != None:
    count = 0
    for entry in body:
      count += len(entry)
  ds.set_body([{"total": count}])

Starlark special functions have a few rules on top of starlark itself:

  • special functions always accept a transformation context (the ctx arg)
  • When you define a data function, qri calls it for you
  • All special functions are optional (you don't need to define them), except transform. transform is required.
  • Special functions are always called in the same order

Another import special function is download, which allows access to the http package:

load("http.star", "http")

def download(ctx):
  data = http.get("http://example.com/data.json")  
  return data

The result of this special function can be accessed using ctx.download:

def transform(ds, ctx):
  ds.set_body(ctx.download)

More docs on the provide API is coming soon.

Running a transform

Let's say the above function is saved as transform.star. You can run it to create a new dataset by using:

qri save --file=transform.star me/dataset_name

Or, you can add more details by creating a dataset file (saved as dataset.yaml, for example) with additional structure:

name: dataset_name
transform:
  scriptpath: transform.star
meta:
  title: My awesome dataset

Then invoke qri:

qri save --file=dataset.yaml

Fun! More info over on our docs site


About

Starlark transformation syntax for qri datasets

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages