Skip to content

[RFC] feat!: kernel-based log replay #3137

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 30 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
8e8378b
chore: setup dat test scaffolding
roeap Jan 10, 2025
333198c
feat: file action replay
roeap Jan 11, 2025
0f2c1c4
feat: add objectstore with commit file caching
roeap Jan 13, 2025
55565ae
feat: add owned file view
roeap Jan 13, 2025
3d6d263
feat: basic updates of file state
roeap Jan 16, 2025
a5672b5
feat: introduce snapshot trait
roeap Jan 16, 2025
b0f794f
test: run some more dat tests
roeap Jan 16, 2025
7a559ac
feat: add commit infos apis to new snapshots
roeap Jan 18, 2025
adb9df8
feat: snapshot updates and improved file data iterators
roeap Jan 19, 2025
f3b0edb
fix: cocnsistent schemas in file replay and object safe snapshot trait
roeap Jan 21, 2025
e83c3ca
test: more snapshot tests
roeap Jan 21, 2025
5364f4a
feat: allow iterating over logical files
roeap Jan 21, 2025
51349f4
fix: revert accidentally commited file
roeap Jan 21, 2025
5d2cf48
fix: tombstone replay
roeap Jan 21, 2025
9e5f1fb
fix: handle unknown features
roeap Apr 8, 2025
6c71705
fix: update to latest kernel state
roeap Apr 8, 2025
bde3b34
test: update or disable tests with unsupported features
roeap Apr 8, 2025
a2464b7
Merge branch 'fix/exotic-logs' into feat/kernel-data
roeap Apr 9, 2025
98d7c9a
feat: first round of latest kernel
roeap Apr 11, 2025
e279555
fix: MRSV
roeap Apr 11, 2025
7004eb7
fix: eager test
roeap Apr 11, 2025
1e2f838
merge main
roeap Apr 11, 2025
15afe77
chore: clippy
roeap Apr 11, 2025
6c143c1
chore: merge clippy
roeap Apr 11, 2025
0a5798a
refactor: move transaction module to kernel
roeap Apr 11, 2025
886d30e
merge move-transactions
roeap Apr 11, 2025
dcc60e5
refactor: move transaction module to kernel
roeap Apr 11, 2025
32311f2
merge move-transactions
roeap Apr 11, 2025
240ce2c
merge main
roeap Apr 12, 2025
0eac792
Merge branch 'main' into feat/kernel-data
roeap Apr 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions .github/actions/load-dat/action.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: Delta Acceptance Tests
description: Load Delta Lake acceptance test data

inputs:
version:
description: "The Python version to set up"
required: false
default: "0.0.3"

target-directory:
description: target directory for acceptance test data
required: false
default: ${{ github.workspace }}/dat

runs:
using: composite

steps:
- name: load DAT
shell: bash
run: |
rm -rf ${{ inputs.target-directory }}
curl -OL https://github.com/delta-incubator/dat/releases/download/v${{ inputs.version }}/deltalake-dat-v${{ inputs.version }}.tar.gz
mkdir -p ${{ inputs.target-directory }}
tar --no-same-permissions -xzf deltalake-dat-v${{ inputs.version }}.tar.gz --directory ${{ inputs.target-directory }}
rm deltalake-dat-v${{ inputs.version }}.tar.gz
4 changes: 2 additions & 2 deletions .github/actions/setup-env/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@ description: "Set up Python, virtual environment, and Rust toolchain"
inputs:
python-version:
description: "The Python version to set up"
required: true
required: false
default: "3.10"

rust-toolchain:
description: "The Rust toolchain to set up"
required: true
required: false
default: "stable"

runs:
Expand Down
19 changes: 12 additions & 7 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ jobs:
uses: actions-rs/toolchain@v1
with:
profile: default
toolchain: '1.82'
toolchain: "1.82"
override: true

- name: Build
Expand All @@ -40,7 +40,7 @@ jobs:
uses: actions-rs/toolchain@v1
with:
profile: default
toolchain: '1.82'
toolchain: "1.82"
override: true

- name: Format
Expand All @@ -62,7 +62,7 @@ jobs:
uses: actions-rs/toolchain@v1
with:
profile: default
toolchain: '1.82'
toolchain: "1.82"
override: true

- name: build and lint with clippy
Expand Down Expand Up @@ -92,9 +92,12 @@ jobs:
uses: actions-rs/toolchain@v1
with:
profile: default
toolchain: '1.82'
toolchain: "1.82"
override: true

- name: Load DAT data
uses: ./.github/actions/load-dat

- name: Run tests
run: cargo test --verbose --features ${{ env.DEFAULT_FEATURES }}

Expand All @@ -121,7 +124,7 @@ jobs:
uses: actions-rs/toolchain@v1
with:
profile: default
toolchain: '1.82'
toolchain: "1.82"
override: true

# Install Java and Hadoop for HDFS integration tests
Expand All @@ -136,6 +139,9 @@ jobs:
tar -xf hadoop-3.4.0.tar.gz -C $GITHUB_WORKSPACE
echo "$GITHUB_WORKSPACE/hadoop-3.4.0/bin" >> $GITHUB_PATH

- name: Load DAT data
uses: ./.github/actions/load-dat

- name: Start emulated services
run: docker compose up -d

Expand All @@ -162,7 +168,7 @@ jobs:
uses: actions-rs/toolchain@v1
with:
profile: default
toolchain: '1.82'
toolchain: "1.82"
override: true

- name: Download Lakectl
Expand All @@ -177,4 +183,3 @@ jobs:
- name: Run tests with rustls (default)
run: |
cargo test --features integration_test_lakefs,lakefs,datafusion

8 changes: 8 additions & 0 deletions .github/workflows/codecov.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,17 +16,25 @@ jobs:
CARGO_TERM_COLOR: always
steps:
- uses: actions/checkout@v4

- name: Install rust
uses: actions-rs/toolchain@v1
with:
profile: default
toolchain: '1.82'
override: true

- name: Install cargo-llvm-cov
uses: taiki-e/install-action@cargo-llvm-cov

- uses: Swatinem/rust-cache@v2

- name: Load DAT data
uses: ./.github/actions/load-dat

- name: Generate code coverage
run: cargo llvm-cov --features ${DEFAULT_FEATURES} --workspace --codecov --output-path codecov.json -- --skip read_table_version_hdfs --skip test_read_tables_lakefs

- name: Upload coverage to Codecov
uses: codecov/codecov-action@v4
with:
Expand Down
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ __blobstorage__
.githubchangeloggenerator.cache.log
.githubchangeloggenerator.cache/
.githubchangeloggenerator*
data
.zed/

# Add all Cargo.lock files except for those in binary crates
Cargo.lock
Expand All @@ -33,5 +33,6 @@ Cargo.lock
justfile
site
__pycache__
dat/
.zed
.zed/
7 changes: 6 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,16 @@ debug = true
debug = "line-tables-only"

[workspace.dependencies]
delta_kernel = { version = "0.9.0", features = [
delta_kernel = { git = "https://github.com/roeap/delta-kernel-rs", rev = "b0cd12264ae4ada8d51cff02b25864258568eb88", features = [
"arrow_54",
"developer-visibility",
"default-engine-rustls",
] }
# delta_kernel = { path = "../delta-kernel-rs/kernel", features = [
# "arrow_54",
# "developer-visibility",
# "default-engine-rustls",
# ] }

# arrow
arrow = { version = "54" }
Expand Down
18 changes: 12 additions & 6 deletions crates/core/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ rust-version.workspace = true
features = ["datafusion", "json", "unity-experimental"]

[dependencies]
delta_kernel.workspace = true
delta_kernel = { workspace = true }

# arrow
arrow = { workspace = true }
Expand All @@ -29,10 +29,7 @@ arrow-ord = { workspace = true }
arrow-row = { workspace = true }
arrow-schema = { workspace = true, features = ["serde"] }
arrow-select = { workspace = true }
parquet = { workspace = true, features = [
"async",
"object_store",
] }
parquet = { workspace = true, features = ["async", "object_store"] }
pin-project-lite = "^0.2.7"

# datafusion
Expand All @@ -49,7 +46,7 @@ datafusion-functions-aggregate = { workspace = true, optional = true }
# serde
serde = { workspace = true, features = ["derive"] }
serde_json = { workspace = true }
strum = { workspace = true}
strum = { workspace = true }

# "stdlib"
bytes = { workspace = true }
Expand All @@ -75,6 +72,9 @@ tokio = { workspace = true, features = [
"parking_lot",
] }

# cahce
quick_cache = { version = "0.6.9" }

# other deps (these should be organized and pulled into workspace.dependencies as necessary)
cfg-if = "1"
dashmap = "6"
Expand All @@ -100,6 +100,7 @@ humantime = { version = "2.1.0" }
[dev-dependencies]
criterion = "0.5"
ctor = "0"
datatest-stable = "0.2"
deltalake-test = { path = "../test", features = ["datafusion"] }
dotenvy = "0"
fs_extra = "1.2.0"
Expand Down Expand Up @@ -130,3 +131,8 @@ python = ["arrow/pyarrow"]
native-tls = ["delta_kernel/default-engine"]
rustls = ["delta_kernel/default-engine-rustls"]
cloud = ["object_store/cloud"]

[[test]]
name = "dat"
harness = false

1 change: 1 addition & 0 deletions crates/core/src/kernel/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ pub mod error;
pub mod models;
pub mod scalars;
mod snapshot;
pub mod snapshot_next;
pub mod transaction;

pub use error::*;
Expand Down
Loading
Loading