-
Notifications
You must be signed in to change notification settings - Fork 933
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix the ORC decoding bug for the timestamp data (#17570)
This PR introduces a band-aid class `run_cache_manager` to handle an exceptional case in TIMESTAMP data type, where the DATA stream (seconds) is processed ahead of SECONDARY stream (nanoseconds) and the excess rows are lost. The fix uses `run_cache_manager` (and also `cache_helper`, which is an implementation detail) to cache the potentially missed data from the DATA stream and let them be used in the next decoding iteration, thus preventing data loss. Closes #17155 Authors: - Tianyu Liu (https://github.com/kingcrimsontianyu) Approvers: - Matthew Murray (https://github.com/Matt711) - Vukasin Milovanovic (https://github.com/vuule) URL: #17570
- Loading branch information
1 parent
caf97ef
commit 4e97cd4
Showing
4 changed files
with
225 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file added
BIN
+5.7 KB
python/cudf/cudf/tests/data/orc/TestOrcFile.timestamp.desynced.snappy.RLEv2.orc
Binary file not shown.
Binary file added
BIN
+5.68 KB
python/cudf/cudf/tests/data/orc/TestOrcFile.timestamp.desynced.uncompressed.RLEv2.orc
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters