-
Notifications
You must be signed in to change notification settings - Fork 77
Error for table with checkpoint #669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Are we sure this isn't a bug with delta-rs? Since delta-rs does not speak deletion vectors to my understanding? |
@hntd187 good point, doing something similar with pyspark is fine! Using this to generate a table with a bunch of appends in from pyspark.sql import SparkSession
from pyspark.sql import Row
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
from delta import *
from pyspark.sql.functions import *
builder = SparkSession.builder.appName("MyApp") \
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
.config("spark.driver.memory", "8g") \
.config('spark.driver.host','127.0.0.1')
spark = configure_spark_with_delta_pip(builder).getOrCreate()
spark.sql(f"CREATE TABLE test_table USING delta LOCATION './repro_spark' AS SELECT '1' as a;")
for i in range(0,100):
df = spark.createDataFrame(
data=[
Row(a=f"{i}"),
],
schema=StructType([
StructField(name="a", dataType=StringType())
])
)
df.write.format("delta").mode("append").save("./repro_spark") yields a delta table that reads just fine Seems pretty plausible that delta-rs writes some deletion vector type field incorrectly in the checkpoint file. |
This seems to be the same issue delta-io/delta-rs#3211 |
I don't think this is due to JSON |
Describe the bug
When using kernel 0.6.1 in the DuckDB delta extension, Kernel returns an error from
ffi::selection_vector_from_dv
:Deletion Vector error: Unknown storage format: ''.
This seems to be caused by checkpoint files because I can only reproduce it for tables with a checkpoint
To Reproduce
generate test table
./repro_table
build latest main of DuckDB delta extension.
then query the table using:
Expected behavior
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: