Skip to content

Example: FFI Table Provider as dynamic module loading #13183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Nov 5, 2024
3 changes: 3 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,9 @@ members = [
"datafusion/substrait",
"datafusion/wasmtest",
"datafusion-examples",
"datafusion-examples/examples/ffi/ffi_example_table_provider",
"datafusion-examples/examples/ffi/ffi_module_interface",
"datafusion-examples/examples/ffi/ffi_module_loader",
"test-utils",
"benchmarks",
]
Expand Down
48 changes: 48 additions & 0 deletions datafusion-examples/examples/ffi/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# Example FFI Usage

The purpose of these crates is to provide an example of how one can use the
DataFusion Foreign Function Interface (FFI). See [API Docs] for detailed
usage.

This example is broken into three crates.

- `ffi_module_interface` is a common library to be shared by both the module
to be loaded and the program that will load it. It defines how the module
is to be structured.
- `ffi_example_table_provider` creates a library to exposes the module.
- `ffi_module_loader` is an example program that loads the module, gets data
from it, and displays this data to the user.

## Building and running

In order for the program to run successfully, the module to be loaded must be
built first. This example expects both the module and the program to be
built using the same build mode (debug or release).

```shell
cd ffi_example_table_provider
cargo build
cd ../ffi_module_loader
cargo run
```

[api docs]: http://docs.rs/datafusion-ffi/latest
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

[package]
name = "ffi_example_table_provider"
version = "0.1.0"
edition = { workspace = true }
publish = false

[dependencies]
abi_stable = "0.11.3"
arrow = { workspace = true }
arrow-array = { workspace = true }
arrow-schema = { workspace = true }
datafusion = { workspace = true }
datafusion-ffi = { workspace = true }
ffi_module_interface = { path = "../ffi_module_interface" }

[lib]
name = "ffi_example_table_provider"
crate-type = ["cdylib", 'rlib']
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

use std::sync::Arc;

use abi_stable::{export_root_module, prefix_type::PrefixTypeTrait};
use arrow_array::RecordBatch;
use datafusion::{
arrow::datatypes::{DataType, Field, Schema},
common::record_batch,
datasource::MemTable,
};
use datafusion_ffi::table_provider::FFI_TableProvider;
use ffi_module_interface::{TableProviderModule, TableProviderModuleRef};

fn create_record_batch(start_value: i32, num_values: usize) -> RecordBatch {
let end_value = start_value + num_values as i32;
let a_vals: Vec<i32> = (start_value..end_value).collect();
let b_vals: Vec<f64> = a_vals.iter().map(|v| *v as f64).collect();

record_batch!(("a", Int32, a_vals), ("b", Float64, b_vals)).unwrap()
}

/// Here we only wish to create a simple table provider as an example.
/// We create an in-memory table and convert it to it's FFI counterpart.
extern "C" fn construct_simple_table_provider() -> FFI_TableProvider {
let schema = Arc::new(Schema::new(vec![
Field::new("a", DataType::Int32, true),
Field::new("b", DataType::Float64, true),
]));

// It is useful to create these as multiple record batches
// so that we can demonstrate the FFI stream.
let batches = vec![
create_record_batch(1, 5),
create_record_batch(6, 1),
create_record_batch(7, 5),
];

let table_provider = MemTable::try_new(schema, vec![batches]).unwrap();

FFI_TableProvider::new(Arc::new(table_provider), true)
}

#[export_root_module]
/// This defines the entry point for using the module.
pub fn get_simple_memory_table() -> TableProviderModuleRef {
TableProviderModule {
create_table: construct_simple_table_provider,
}
.leak_into_prefix()
}
26 changes: 26 additions & 0 deletions datafusion-examples/examples/ffi/ffi_module_interface/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

[package]
name = "ffi_module_interface"
version = "0.1.0"
edition = "2021"
publish = false

[dependencies]
abi_stable = "0.11.3"
datafusion-ffi = { workspace = true }
49 changes: 49 additions & 0 deletions datafusion-examples/examples/ffi/ffi_module_interface/src/lib.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

use abi_stable::{
declare_root_module_statics,
library::{LibraryError, RootModule},
package_version_strings,
sabi_types::VersionStrings,
StableAbi,
};
use datafusion_ffi::table_provider::FFI_TableProvider;

#[repr(C)]
#[derive(StableAbi)]
#[sabi(kind(Prefix(prefix_ref = TableProviderModuleRef)))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand the need for this interface module.

Couldn't the caller program simple use construct_simple_table_provider directly? (It would have to know the name of the entry point of course which is less general than what you have here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't actually need it, but it does give a separation of concerns where you have one crate for the interface so that any module implementers don't need to executable as a dependency. I'm mostly following the same approach as the examples in the abi_stable crate but it's also a model I've used in other places.

/// This struct defines the module interfaces. It is to be shared by
/// both the module loading program and library that implements the
/// module. It is possible to move this definition into the loading
/// program and reference it in the modules, but this example shows
/// how a user may wish to separate these concerns.
pub struct TableProviderModule {
/// Constructs the table provider
pub create_table: extern "C" fn() -> FFI_TableProvider,
}

impl RootModule for TableProviderModuleRef {
declare_root_module_statics! {TableProviderModuleRef}
const BASE_NAME: &'static str = "ffi_example_table_provider";
const NAME: &'static str = "ffi_example_table_provider";
const VERSION_STRINGS: VersionStrings = package_version_strings!();

fn initialization(self) -> Result<Self, LibraryError> {
Ok(self)
}
}
29 changes: 29 additions & 0 deletions datafusion-examples/examples/ffi/ffi_module_loader/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

[package]
name = "ffi_module_loader"
version = "0.1.0"
edition = "2021"
publish = false

[dependencies]
abi_stable = "0.11.3"
datafusion = { workspace = true }
datafusion-ffi = { workspace = true }
ffi_module_interface = { path = "../ffi_module_interface" }
tokio = { workspace = true, features = ["rt-multi-thread", "parking_lot"] }
63 changes: 63 additions & 0 deletions datafusion-examples/examples/ffi/ffi_module_loader/src/main.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

use std::sync::Arc;

use datafusion::{
error::{DataFusionError, Result},
prelude::SessionContext,
};

use abi_stable::library::{development_utils::compute_library_path, RootModule};
use datafusion_ffi::table_provider::ForeignTableProvider;
use ffi_module_interface::TableProviderModuleRef;

#[tokio::main]
async fn main() -> Result<()> {
// Find the location of the library. This is specific to the build environment,
// so you will need to change the approach here based on your use case.
let target: &std::path::Path = "../../../../target/".as_ref();
let library_path = compute_library_path::<TableProviderModuleRef>(target)
.map_err(|e| DataFusionError::External(Box::new(e)))?;

// Load the module
let table_provider_module =
TableProviderModuleRef::load_from_directory(&library_path)
.map_err(|e| DataFusionError::External(Box::new(e)))?;

// By calling the code below, the table provided will be created within
// the module's code.
let ffi_table_provider =
table_provider_module
.create_table()
.ok_or(DataFusionError::NotImplemented(
"External table provider failed to implement create_table".to_string(),
))?();

// In order to access the table provider within this executable, we need to
// turn it into a `ForeignTableProvider`.
let foreign_table_provider: ForeignTableProvider = (&ffi_table_provider).into();

let ctx = SessionContext::new();

// Display the data to show the full cycle works.
ctx.register_table("external_table", Arc::new(foreign_table_provider))?;
let df = ctx.table("external_table").await?;
df.show().await?;

Ok(())
}
Loading