-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
better set_body #42
better set_body #42
Conversation
set_body wasn't doing the right thing when it comes to determining the destination datset structure for writing a body. We now inherit structure from existing dataset data, and only fall back to json if no structure exists. The general problem here is a separation of concerns between data in the starlark runtime and dataset serialization formats. Those should always be seperate. closes qri-io/qri#756
the mental model behind 'raw' and 'data_format' is off. 'data_format' implies that we're changing the ultimate data format, which is not the case. Now set_body has only one optional argument: 'parse_as', which defaults to the empty string. By default qri assumes the data value provided to set_body is an iterable starlark data structure (tuple, set, list, dict). When parse_as is set, set_body assumes the provided body value will be a string of serialized structured data in the given format. valid parse_as values are "json", "csv", "cbor", "xlsx". We'll need more tests to confirm each format works properly, but all this is doing under the hood is overriding all parsing and setting body bytes directly, which is analogous to "qri save --body file.[parse_as] me/dataset". BREAKING CHANGE: ds.set_body raw and data_format params have been replaced by parse_as
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Have just a few tiny suggestions
ds/testdata/test.star
Outdated
|
||
# csv_ds is a global variable provided by dataset_test.go | ||
# "cycling" csv data through starlark shouldn't have significant effects on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about "round-tripping" instead of "cycling"
ds/dataset.go
Outdated
} | ||
|
||
return starlark.None, fmt.Errorf("expected raw data for body to be a string") | ||
d.write.SetBodyFile(qfs.NewMemfileBytes(fmt.Sprintf("data.%s", df), []byte(string(str)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use "body.%s" instead of "data.%s".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totally. makes me want this more: qri-io/dataset#188
ds/dataset_test.go
Outdated
}, | ||
}, | ||
} | ||
ds.SetBodyFile(qfs.NewMemfileBytes("data.csv", []byte(text))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use "body.csv" instead of "data.csv"
set_body wasn't doing the right thing when it comes to determining the destination datset structure for writing a body. We now inherit structure from existing dataset data, and only fall back to json if no structure exists.
The general problem here is a separation of concerns between data in the starlark runtime and dataset serialization formats. Those should always be seperate, with set_body giving qri the data it wants serialized, or skipping serialization with
ds.set_body(data, parse_as='format'