You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to load a Parquet version of ImageNet that contains a "label" column. The value of the column is a string for the train split and null for the test and val splits. While loading dataset, I got an assertion error:
AssertionError: Types mismatch: null != string
Here's the traceback for the repro below:
for b_out in map_transformer.apply_transform(iter(blocks), ctx):
File "/Users/balaji/ray/python/ray/data/_internal/execution/operators/map_transformer.py", line 398, in __call__
yield output_buffer.next()
File "/Users/balaji/ray/python/ray/data/_internal/output_buffer.py", line 73, in next
block_to_yield = self._buffer.build()
File "/Users/balaji/ray/python/ray/data/_internal/delegating_block_builder.py", line 68, in build
return self._builder.build()
File "/Users/balaji/ray/python/ray/data/_internal/table_block.py", line 133, in build
return self._concat_tables(tables)
File "/Users/balaji/ray/python/ray/data/_internal/arrow_block.py", line 149, in _concat_tables
return transform_pyarrow.concat(tables)
File "/Users/balaji/ray/python/ray/data/_internal/arrow_ops/transform_pyarrow.py", line 308, in concat
col = _concatenate_chunked_arrays(col_chunked_arrays)
File "/Users/balaji/ray/python/ray/data/_internal/arrow_ops/transform_pyarrow.py", line 201, in _concatenate_chunked_arrays
assert type_ == arr.type, f"Types mismatch: {type_} != {arr.type}"
AssertionError: Types mismatch: null != string
The reason this happens is that our check is overly-strict: we should be able to combine null and string data.
Medium: It is a significant difficulty but I can work around it.
The text was updated successfully, but these errors were encountered:
bveeramani
added
bug
Something that is supposed to be working; but isn't
P1
Issue that should be fixed within a few weeks
data
Ray Data-related issues
labels
Dec 18, 2024
What happened + What you expected to happen
I was trying to load a Parquet version of ImageNet that contains a "label" column. The value of the column is a string for the train split and null for the test and val splits. While loading dataset, I got an assertion error:
Here's the traceback for the repro below:
The reason this happens is that our check is overly-strict: we should be able to combine null and string data.
Versions / Dependencies
3362ef4
Reproduction script
Issue Severity
Medium: It is a significant difficulty but I can work around it.
The text was updated successfully, but these errors were encountered: