Skip to content

Schema derive macro (reborn) #156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 36 commits into from
Apr 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
c5f5e81
initial derive macros
nicklan Feb 28, 2024
f2f6e34
simpler version
nicklan Mar 1, 2024
9abe8ea
Update derive-macros/src/lib.rs
nicklan Mar 1, 2024
addb7a3
cleanup get_schema_name_from_attr<
nicklan Mar 1, 2024
7f00ac1
to_ascii_uppercase makes more sense
nicklan Mar 1, 2024
9dcc096
add test
nicklan Mar 1, 2024
6d00e43
fmt
nicklan Mar 1, 2024
48219ee
clippy
nicklan Mar 1, 2024
5e62148
make init static
nicklan Mar 5, 2024
2cd6027
better static via returning SchemaRef
nicklan Mar 5, 2024
746b6f3
use get_schema in actions/mod.rs
nicklan Mar 6, 2024
150b024
use dervied schemas, and fix remove schema
nicklan Mar 6, 2024
c4e037e
tags is optional on Add, use get_schema everywhere
nicklan Mar 6, 2024
c7785c2
remove leftover code in schemas.rs
nicklan Mar 6, 2024
b51b7b1
switch to using "project" to get schemas
nicklan Mar 12, 2024
9bca3f5
remove unneeded annotations
nicklan Mar 12, 2024
0bf4c3a
move snapshot to new way
nicklan Mar 12, 2024
b273e3b
use static names
nicklan Mar 12, 2024
505983e
fix comment
nicklan Mar 12, 2024
0a7c30d
Merge branch 'main' into schema-derive-macro
nicklan Mar 28, 2024
eeee0f3
add docs, remove attribute
nicklan Mar 28, 2024
2467469
use full type path
nicklan Mar 28, 2024
2b9bccb
fix quoting joining with ::
nicklan Mar 28, 2024
effdd09
switch order to be sure it doesn't matter
nicklan Mar 28, 2024
5d87b4e
fix doc comment
nicklan Mar 29, 2024
eb9ecce
don't need to collect
nicklan Mar 29, 2024
78342ad
make all hashmaps <string,string>, not <string, option<string>>
nicklan Mar 29, 2024
ffa7d63
fix in acceptance
nicklan Mar 29, 2024
1152b8c
switch to ToDataType
nicklan Mar 29, 2024
1e624b4
use passed order to `project` and fix column reordering bug
nicklan Mar 30, 2024
2f419f1
remove expensive parquet schema search
nicklan Mar 30, 2024
c69f7ce
no need for -1s
nicklan Apr 2, 2024
84f3d33
const over static
nicklan Apr 2, 2024
376439d
better comment, and don't need i32s anymore
nicklan Apr 2, 2024
58ef6e0
add automatically_derived attribute to derived impl
nicklan Apr 2, 2024
12a5241
doc fixup and comment about order
nicklan Apr 3, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
[workspace]
members = [
"acceptance",
"derive-macros",
"ffi",
"kernel",
"kernel/examples/dump-table", # todo: put back to `examples/*` when inspect-table is fixed
Expand Down
2 changes: 1 addition & 1 deletion acceptance/src/meta.rs
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ impl TestCaseInfo {
properties: metadata
.configuration
.iter()
.map(|(k, v)| (k.clone(), v.clone().unwrap()))
.map(|(k, v)| (k.clone(), v.clone()))
.collect(),
min_reader_version: protocol.min_reader_version as u32,
min_writer_version: protocol.min_writer_version as u32,
Expand Down
20 changes: 20 additions & 0 deletions derive-macros/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
[package]
name = "derive-macros"
authors.workspace = true
edition.workspace = true
homepage.workspace = true
keywords.workspace = true
license.workspace = true
repository.workspace = true
readme.workspace = true
version.workspace = true

[lib]
proc-macro = true

[dependencies]
proc-macro2 = "1"
syn = { version = "2.0", features = ["extra-traits"] }
quote = "1.0"


85 changes: 85 additions & 0 deletions derive-macros/src/lib.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
use proc_macro2::{Ident, TokenStream};
use quote::{quote, quote_spanned};
use syn::spanned::Spanned;
use syn::{parse_macro_input, Data, DataStruct, DeriveInput, Fields, PathArguments, Type};

/// Derive a `deltakernel::schemas::ToDataType` implementation for the annotated struct. The actual
/// field names in the schema (and therefore of the struct members) are all mandated by the Delta
/// spec, and so the user of this macro is responsible for ensuring that
/// e.g. `Metadata::schema_string` is the snake_case-ified version of `schemaString` from [Delta's
/// Change Metadata](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#change-metadata)
/// action (this macro allows the use of standard rust snake_case, and will convert to the correct
/// delta schema camelCase version).
#[proc_macro_derive(Schema)]
pub fn derive_schema(input: proc_macro::TokenStream) -> proc_macro::TokenStream {
Copy link
Collaborator Author

@nicklan nicklan Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment above added to address https://github.com/delta-incubator/delta-kernel-rs/pull/129/files#r1527171512

This will show up in the docs for the derive-macros crate as:
image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wanted you could also omit the crate from docs with #[doc(hidden)]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sort of consider the crate docs like "implementer" docs. Like someone writing a connector would read them. I think understanding what this macro does is useful from that perspective, so likely it should be left in.

let input = parse_macro_input!(input as DeriveInput);
let struct_ident = input.ident;

let schema_fields = gen_schema_fields(&input.data);
let output = quote! {
#[automatically_derived]
impl crate::actions::schemas::ToDataType for #struct_ident {
fn to_data_type() -> crate::schema::DataType {
use crate::actions::schemas::{ToDataType, GetStructField};
crate::schema::StructType::new(vec![
#schema_fields
]).into()
}
}
};
proc_macro::TokenStream::from(output)
}

// turn our struct name into the schema name, goes from snake_case to camelCase
fn get_schema_name(name: &Ident) -> Ident {
let snake_name = name.to_string();
let mut next_caps = false;
let ret: String = snake_name
.chars()
.filter_map(|c| {
if c == '_' {
next_caps = true;
None
} else if next_caps {
next_caps = false;
// This assumes we're using ascii, should be okay
Some(c.to_ascii_uppercase())
} else {
Some(c)
}
})
.collect();
Ident::new(&ret, name.span())
}

fn gen_schema_fields(data: &Data) -> TokenStream {
let fields = match data {
Data::Struct(DataStruct {
fields: Fields::Named(fields),
..
}) => &fields.named,
_ => panic!("this derive macro only works on structs with named fields"),
};

let schema_fields = fields.iter().map(|field| {
let name = field.ident.as_ref().unwrap(); // we know these are named fields
let name = get_schema_name(name);
match field.ty {
Type::Path(ref type_path) => {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rescuing https://github.com/delta-incubator/delta-kernel-rs/pull/129/files#r1527170864:

Don't we need to emit the fully qualified type name, in case the user didn't use the (full) path to it?
(especially since, if I understand correctly, this is an unresolved token stream, so any qualifiers the user gave are probably needed for it to compile at all)

I've changed it here to join the full type path that was specified at the derive site. This means that if they have useed it, it will just be the name of the type, and if they have put::a:full::Path, that will happen here too

let type_path_quoted = type_path.path.segments.iter().map(|segment| {
let segment_ident = &segment.ident;
match &segment.arguments {
PathArguments::None => quote! { #segment_ident :: },
PathArguments::AngleBracketed(angle_args) => quote! { #segment_ident::#angle_args :: },
_ => panic!("Can only handle <> type path args"),
}
});
quote_spanned! { field.span() => #(#type_path_quoted),* get_struct_field(stringify!(#name))}
}
_ => {
panic!("Can't handle type: {:?}", field.ty);
}
}
});
quote! { #(#schema_fields),* }
}
3 changes: 3 additions & 0 deletions kernel/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,9 @@ url = "2"
uuid = "1.3.0"
z85 = "3.0.5"

# bring in our derive macros
derive-macros = { path = "../derive-macros" }

# used for developer-visibility
visibility = "0.1.0"

Expand Down
3 changes: 2 additions & 1 deletion kernel/src/actions/deletion_vector.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,13 @@ use std::io::{Cursor, Read};
use std::sync::Arc;

use bytes::Bytes;
use derive_macros::Schema;
use roaring::RoaringTreemap;
use url::Url;

use crate::{DeltaResult, Error, FileSystemClient};

#[derive(Debug, Clone, PartialEq, Eq)]
#[derive(Debug, Clone, PartialEq, Eq, Schema)]
pub struct DeletionVectorDescriptor {
/// A single character to indicate how to access the DV. Legal options are: ['u', 'i', 'p'].
pub storage_type: String,
Expand Down
Loading