Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panicked at 'index out of bounds: the len is 0 but the index is 0' in datafusion::physical_plan::projection::validate_output_ordering #7482

Closed
alamb opened this issue Sep 5, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@alamb
Copy link
Contributor

alamb commented Sep 5, 2023

Describe the bug

Similarly to #7418 when I am working on an upgrade to DataFusion in IOx (https://github.com/influxdata/influxdb_iox/pull/8577) we get a failure like this:

---- physical_optimizer::projection_pushdown::tests::test_integration stdout ----
thread 'physical_optimizer::projection_pushdown::tests::test_integration' panicked at 'index out of bounds: the len is 0 but the index is 0', /Users/alamb/.cargo/git/checkouts/arrow-datafusion-71ae82d9dec9a01c/3ea870c/datafusion/core/src/physical_plan/projection.rs:361:12
stack backtrace:
   0: rust_begin_unwind
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:593:5
   1: core::panicking::panic_fmt
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/panicking.rs:67:14
   2: core::panicking::panic_bounds_check
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/panicking.rs:162:5
   3: <usize as core::slice::index::SliceIndex<[T]>>::index
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/slice/index.rs:261:10
   4: core::slice::index::<impl core::ops::index::Index<I> for [T]>::index
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/slice/index.rs:18:9
   5: <alloc::vec::Vec<T,A> as core::ops::index::Index<I>>::index
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/alloc/src/vec/mod.rs:2675:9
   6: datafusion::physical_plan::projection::validate_output_ordering::{{closure}}
             at /Users/alamb/.cargo/git/checkouts/arrow-datafusion-71ae82d9dec9a01c/3ea870c/datafusion/core/src/physical_plan/projection.rs:361:12
   7: core::option::Option<T>::and_then
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/option.rs:1413:24
   8: datafusion::physical_plan::projection::validate_output_ordering
             at /Users/alamb/.cargo/git/checkouts/arrow-datafusion-71ae82d9dec9a01c/3ea870c/datafusion/core/src/physical_plan/projection.rs:357:5
   9: datafusion::physical_plan::projection::ProjectionExec::try_new
             at /Users/alamb/.cargo/git/checkouts/arrow-datafusion-71ae82d9dec9a01c/3ea870c/datafusion/core/src/physical_plan/projection.rs:151:13
  10: iox_query::physical_optimizer::projection_pushdown::wrap_user_into_projections
             at ./src/physical_optimizer/projection_pushdown.rs:398:25
  11: <iox_query::physical_optimizer::projection_pushdown::ProjectionPushdown as datafusion::physical_optimizer::optimizer::PhysicalOptimizerRule>::optimize::{{closure}}
             at ./src/physical_optimizer/projection_pushdown.rs:214:32
  12: datafusion_common::tree_node::TreeNode::transform_down
             at /Users/alamb/.cargo/git/checkouts/arrow-datafusion-71ae82d9dec9a01c/3ea870c/datafusion/common/src/tree_node.rs:124:24
  13: datafusion_common::tree_node::TreeNode::transform_down::{{closure}}
             at /Users/alamb/.cargo/git/checkouts/arrow-datafusion-71ae82d9dec9a01c/3ea870c/datafusion/common/src/tree_node.rs:125:38
  14: core::iter::adapters::map::map_try_fold::{{closure}}
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/iter/adapters/map.rs:91:28
  15: core::iter::traits::iterator::Iterator::try_fold
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/iter/traits/iterator.rs:2303:21
  16: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::try_fold
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/iter/adapters/map.rs:117:9
  17: <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::try_fold
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/iter/adapters/mod.rs:195:9
  18: <I as alloc::vec::in_place_collect::SpecInPlaceCollect<T,I>>::collect_in_place
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/alloc/src/vec/in_place_collect.rs:258:13
  19: alloc::vec::in_place_collect::<impl alloc::vec::spec_from_iter::SpecFromIter<T,I> for alloc::vec::Vec<T>>::from_iter
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/alloc/src/vec/in_place_collect.rs:182:28
  20: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/alloc/src/vec/mod.rs:2696:9
  21: core::iter::traits::iterator::Iterator::collect
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/iter/traits/iterator.rs:1895:9
  22: <core::result::Result<V,E> as core::iter::traits::collect::FromIterator<core::result::Result<A,E>>>::from_iter::{{closure}}
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/result.rs:1932:51
  23: core::iter::adapters::try_process
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/iter/adapters/mod.rs:164:17
  24: <core::result::Result<V,E> as core::iter::traits::collect::FromIterator<core::result::Result<A,E>>>::from_iter
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/result.rs:1932:9
  25: core::iter::traits::iterator::Iterator::collect
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/iter/traits/iterator.rs:1895:9
  26: <alloc::sync::Arc<T> as datafusion_common::tree_node::TreeNode>::map_children
             at /Users/alamb/.cargo/git/checkouts/arrow-datafusion-71ae82d9dec9a01c/3ea870c/datafusion/common/src/tree_node.rs:343:17
  27: datafusion_common::tree_node::TreeNode::transform_down
             at /Users/alamb/.cargo/git/checkouts/arrow-datafusion-71ae82d9dec9a01c/3ea870c/datafusion/common/src/tree_node.rs:125:9
  28: datafusion_common::tree_node::TreeNode::transform_down::{{closure}}
             at /Users/alamb/.cargo/git/checkouts/arrow-datafusion-71ae82d9dec9a01c/3ea870c/datafusion/common/src/tree_node.rs:125:38
  29: core::iter::adapters::map::map_try_fold::{{closure}}
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/iter/adapters/map.rs:91:28
  30: core::iter::traits::iterator::Iterator::try_fold
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/iter/traits/iterator.rs:2303:21
  31: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::try_fold
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/iter/adapters/map.rs:117:9
  32: <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::try_fold
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/iter/adapters/mod.rs:195:9
  33: <I as alloc::vec::in_place_collect::SpecInPlaceCollect<T,I>>::collect_in_place
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/alloc/src/vec/in_place_collect.rs:258:13
  34: alloc::vec::in_place_collect::<impl alloc::vec::spec_from_iter::SpecFromIter<T,I> for alloc::vec::Vec<T>>::from_iter
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/alloc/src/vec/in_place_collect.rs:182:28
  35: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/alloc/src/vec/mod.rs:2696:9
  36: core::iter::traits::iterator::Iterator::collect
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/iter/traits/iterator.rs:1895:9
  37: <core::result::Result<V,E> as core::iter::traits::collect::FromIterator<core::result::Result<A,E>>>::from_iter::{{closure}}
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/result.rs:1932:51
  38: core::iter::adapters::try_process
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/iter/adapters/mod.rs:164:17
  39: <core::result::Result<V,E> as core::iter::traits::collect::FromIterator<core::result::Result<A,E>>>::from_iter
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/result.rs:1932:9
  40: core::iter::traits::iterator::Iterator::collect
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/iter/traits/iterator.rs:1895:9
  41: <alloc::sync::Arc<T> as datafusion_common::tree_node::TreeNode>::map_children
             at /Users/alamb/.cargo/git/checkouts/arrow-datafusion-71ae82d9dec9a01c/3ea870c/datafusion/common/src/tree_node.rs:343:17
  42: datafusion_common::tree_node::TreeNode::transform_down
             at /Users/alamb/.cargo/git/checkouts/arrow-datafusion-71ae82d9dec9a01c/3ea870c/datafusion/common/src/tree_node.rs:125:9
  43: <iox_query::physical_optimizer::projection_pushdown::ProjectionPushdown as datafusion::physical_optimizer::optimizer::PhysicalOptimizerRule>::optimize
             at ./src/physical_optimizer/projection_pushdown.rs:41:9
  44: iox_query::physical_optimizer::test_util::OptimizationTest::new_with_config
             at ./src/physical_optimizer/test_util.rs:42:29
  45: iox_query::physical_optimizer::test_util::OptimizationTest::new
             at ./src/physical_optimizer/test_util.rs:27:9
  46: iox_query::physical_optimizer::projection_pushdown::tests::test_integration
             at ./src/physical_optimizer/projection_pushdown.rs:1364:13
  47: iox_query::physical_optimizer::projection_pushdown::tests::test_integration::{{closure}}
             at ./src/physical_optimizer/projection_pushdown.rs:1318:27
  48: core::ops::function::FnOnce::call_once
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/ops/function.rs:250:5
  49: core::ops::function::FnOnce::call_once
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

To Reproduce

No response

Expected behavior

No response

Additional context

I believe it is failing in validate_output_ordering (a different function added in #7364 by @berkaysynnada (looks like it needs a check for empty).

Clearly our coverage is not good in this regard (aka I am not sure what it is about IOx that it is creating plans that report output ordering with empty ordering

@alamb alamb added the bug Something isn't working label Sep 5, 2023
@alamb
Copy link
Contributor Author

alamb commented Sep 5, 2023

I wonder if we could improve this situation by encapsulating the creation of OutputOrdering so that Some(vec![]) was not possible to create 🤔

@alamb
Copy link
Contributor Author

alamb commented Sep 5, 2023

BTW I found another workaround in IOx. We had configured a FileScanConfig like this (with output_ordering: vec![vec![]]) :

        let base_config = FileScanConfig {
            object_store_url: self.object_store_url.clone(),
            file_schema: schema,
            file_groups: vec![vec![PartitionedFile {
                object_meta: self.object_meta.clone(),
                partition_values: vec![],
                range: None,
                extensions: None,
            }]],
            statistics: Statistics::default(),
            projection: None,
            limit: None,
            table_partition_cols: vec![],
            // Parquet files ARE actually sorted but we don't care here since we just construct a `collect` plan.
            output_ordering: vec![],
            infinite_source: false,
        };

I could stop the crashes like this:

diff --git a/parquet_file/src/storage.rs b/parquet_file/src/storage.rs
index 285e272f7..c520e3bd0 100644
--- a/parquet_file/src/storage.rs
+++ b/parquet_file/src/storage.rs
@@ -137,7 +137,7 @@ impl ParquetExecInput {
             limit: None,
             table_partition_cols: vec![],
             // Parquet files ARE actually sorted but we don't care here since we just construct a `collect` plan.
-            output_ordering: vec![vec![]],
+            output_ordering: vec![],
             infinite_source: false,
         };
         let exec = ParquetExec::new(base_config, None, None);

I do think this illustrates why having a dedicated structure to encapsulate the output orderings might be nice. But definitely not necessary

@berkaysynnada
Copy link
Contributor

Hi @alamb

I can quickly send a fix PR with a similar check, just like this fix. As I mentioned, Datafusion does not cause the creation of such an output_ordering = Some(vec![]), but we have taken the measure for this in find_orderings_of_exprs, so I think we can implement it here too.

@alamb
Copy link
Contributor Author

alamb commented Sep 7, 2023

I can quickly send a fix PR with a similar check, just like #7457. As I mentioned, Datafusion does not cause the creation of such an output_ordering = Some(vec![]), but we have taken the measure for this in find_orderings_of_exprs, so I think we can implement it here too.

Thanks -- I don't think the fix would be high priority (given I have fixed IOx tests not to create this type of ordering it is not blocking anything in my mind)

In general, I think a better use of time would be to change the API so that it is not possible to hit this

So instead of
https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.ExecutionPlan.html#tymethod.output_ordering

One way might be to do something like this:

fn output_ordering(&self) -> &OutputOrdering { ..}

struct OutputOrdering {
  exprs: Vec<PhysicalSortExpr>
}

impl OutputOrdering {
  fn append(&mut self, order: PhysicalSortExpr) {
   if !order.is_empty() {
    self.exprs.push(order)
   }
}

@sergiimk
Copy link
Contributor

sergiimk commented Nov 6, 2023

Note: the issues is present in datafusion 32.0.0 (although seemingly reported a month before the release).

I could not figure out a workaround for my case (read_csv -> complex processing -> write_parquet) as I didn't see any place where my code has direct control over output_ordering.

It's unfortunate that for a 2-month-old crash there is no patch release, but I will test against 33.0.0-rc1 and hope it fixes it.

Edit: 33.0.0-rc1 does fix the issue

@alamb
Copy link
Contributor Author

alamb commented Nov 6, 2023

Closing this issue as it was fixed in 33.0.0. Thanks @sergiimk

@alamb alamb closed this as completed Nov 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants