Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(spark): add support for more types and literals (binary, list, map, intervals, timestamps) #311
feat(spark): add support for more types and literals (binary, list, map, intervals, timestamps) #311
Changes from all commits
14a6b5f
410475f
fd01c2f
eaf5381
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes producing from the deprecated timestamp type to the new precision-aware version. That may break some consumers if they haven't been updated yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handling
containsNull
here is non-trivial, since Substrait doesn't have such a concept. The nullability of the non-empty list is taken from the nullability of the element type, which is gotten from the first element. So we'll need to encodecontainsNull
into theelementType
. It can be done, but requires some more work (as there's no "withNullable" method or anything) and this PR is already big enough, so I'll do it as a followup.Same applies for
map
belowThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure what you mean here.
Is this when going from Substrait to Spark or from Spark to Substrait?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Substrait itself doesn't contain the "containsNull" arg, so the Substrait -> Spark conversion has to infer it, and the Spark -> Substrait conversion doesn't provide the correct information to infer it based on.
Easiest to see from the commented-out test:
Starting with Spark:
value: [1,null,3] type: ArrayType(IntegerType,true)
Gets converted into Substrait literal:
ListLiteral{nullable=false, values=[I32Literal{nullable=false, value=1}, NullLiteral{nullable=false, type=I32{nullable=true}}, I32Literal{nullable=false, value=3}]}
Note that there is no "containsNull" anywhere. So the way we infer it when converting back into Substrait is to look at the first element and it's
nullable
arg. Which in today's world is alwaysnullabe=false
, leading to:Spark:
value: [1,null,3] type: ArrayType(IntegerType,false)
(Funnily enough that's the case even if the first element is a null.. but fixing that in itself wouldn't be enough, since the null might be elsewhere in the list or even in a different row, but we should have the "type" of the literal match across rows.)
I think fixing this requires changing the
ToSubstraitLiteral.convert/convertWithValue
so that the nullability they set can be overridden in this case.