forked from datafusion-contrib/datafusion-orc
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request datafusion-contrib#26 from chmp/feature/release-notes
Update changelog
- Loading branch information
Showing
8 changed files
with
162 additions
and
40 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,44 @@ | ||
# Change log | ||
|
||
# 0.5.0 | ||
## 0.6.0 | ||
|
||
### Removed support arrow in favor of arrow2 | ||
|
||
Drop support [arrow][] in favor of [arrow2][]. Arrow2 is a smaller, faster to | ||
build implementation of the Arrow format that follow semver. It is also used by | ||
[polars][]. That said most of the implementation is pretty generic and [arrow][] | ||
support could be added. To convert arrow2 arrays into arrow arrays and record | ||
batches see the [arrow2-to-arrow][] example. | ||
|
||
### More flexible support for Rust / Arrow features | ||
|
||
`serde_arrow` now supports many more Rust and Arrow features. | ||
|
||
- Rust: Struct, Lists, Maps, Enums, Tuples | ||
- Arrow: Struct, List, Maps, Unions, ... | ||
|
||
### Removal of custom schema APIs | ||
|
||
`serde_arrow` no longer relies on its own schema object. Now all schema | ||
information is retrieved from arrow fields with additional metadata. | ||
|
||
### More flexible APIs | ||
|
||
In addition to the previous API that worked on a sequence of records, | ||
`serde_arrow` now also supports to operate on a sequence of individual items | ||
(`serialize_into_array`, `deserialize_form_array`) and to operate on single | ||
items (`ArraysBuilder`). | ||
|
||
## Support for dictionary encoded strings (categories) | ||
|
||
`serde_arrow` supports dictionary encoding for string arrays. This way string | ||
arrays are encoded via a lookup table to avoid including repeated string values. | ||
|
||
## 0.5.0 | ||
|
||
- Bump arrow to version 16.0.0 | ||
|
||
[arrow]: https://github.com/apache/arrow-rs | ||
[arrow2]: https://github.com/jorgecarleitao/arrow2 | ||
[polars]: https://github.com/pola-rs/polars | ||
[arrow2-to-arrow]: ./arrow2-to-arrow |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
# Quickstart guide | ||
|
||
**Contents** | ||
|
||
1. [Working with date time objects](#working-with-date-time-objects) | ||
2. [Dictionary encoding for strings](#dictionary-encoding-for-strings) | ||
3. [Convert from arrow2 to arrow arrays](#convert-from-arrow2-to-arrow-arrays) | ||
|
||
## Working with date time objects | ||
|
||
When using `chrono`'s `DateTime<Utc>` or `NaiveDateTime`, the values are per | ||
default encoded as strings. To stores them as `Date64` columns, the data type | ||
has to be modified. | ||
|
||
For example | ||
|
||
```rust | ||
#[derive(Debug, PartialEq, Serialize, Deserialize)] | ||
struct Record { | ||
val: NaiveDateTime, | ||
} | ||
|
||
let records: &[Record] = &[ | ||
Record { | ||
val: NaiveDateTime::from_timestamp(12 * 60 * 60 * 24, 0), | ||
}, | ||
Record { | ||
val: NaiveDateTime::from_timestamp(9 * 60 * 60 * 24, 0), | ||
}, | ||
]; | ||
|
||
let mut fields = serialize_into_fields(records, Default::default()).unwrap(); | ||
``` | ||
|
||
The traced field `val` will be of type `Utf8`. To store it as `Date64` field, | ||
modify the data type as in | ||
|
||
```rust | ||
let val_field = find_field_mut(&mut fields, "val").unwrap(); | ||
val_field.data_type = DataType::Date64; | ||
val_field.metadata = Strategy::NaiveStrAsDate64.into(); | ||
``` | ||
|
||
## Dictionary encoding for strings | ||
|
||
To encode strings with repeated values via a dictionary, the data type of the | ||
corresponding field must be changed from `Utf8` or `LargeUtf8` to `Dictionary`. | ||
|
||
For an existing field this can be done via: | ||
|
||
```rust | ||
field.data_type = DataType::Dictionary( | ||
// the integer type used for the keys | ||
IntegerType::UInt32, | ||
// the data type of the values | ||
Box::new(DataType::Utf8), | ||
// serde_arrow does not support sorted generating sorted dictionaries | ||
false, | ||
); | ||
``` | ||
|
||
To dictionary encode all string fields, set the `string_dictionary_encoding` of | ||
`TracingOptions`, when tracing the fields: | ||
|
||
```rust | ||
let fields = serialize_into_fields( | ||
&items, | ||
TracingOptions::default().string_dictionary_encoding(true), | ||
)?; | ||
``` | ||
|
||
## Convert from arrow2 to arrow arrays | ||
|
||
Both `arrow` and `arrow2` use the Arrow memory format. Thanks to this fact, it | ||
is possible to convert arrays between both packages with minimal work using | ||
their respective FFI interfaces: | ||
|
||
- [arrow2::ffi::export_field_to_c](https://docs.rs/arrow2/latest/arrow2/ffi/fn.export_field_to_c.html) | ||
- [arrow2::ffi_export_array_to_ce](https://docs.rs/arrow2/latest/arrow2/ffi/fn.export_array_to_c.html) | ||
- [arrow::ffi::ArrowArray::new](https://docs.rs/arrow/latest/arrow/ffi/struct.ArrowArray.html#method.new) | ||
|
||
A fully worked example can be found in the [arrow2-to-arrow][] example of the | ||
`serde_arrow` repository. | ||
|
||
[arrow2-to-arrow]: https://github.com/chmp/serde_arrow/tree/main/arrow2-to-arrow |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters