Skip to content

UNION ALL not correctly projects the floating numbers #10688

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
comphead opened this issue May 27, 2024 · 12 comments
Closed

UNION ALL not correctly projects the floating numbers #10688

comphead opened this issue May 27, 2024 · 12 comments
Assignees
Labels
good first issue Good for newcomers

Comments

@comphead
Copy link
Contributor

there is some issue with UNION ALL and type derivation

> select avg(a) from (select -128.2::float a union all select 32768.3 union all select 27.3);
+--------------------+
| AVG(a)             |
+--------------------+
| 10889.133334350587 |
+--------------------+
1 row(s) fetched. 
Elapsed 0.011 seconds.

> select avg(a) from (select -128.2::float a union all select 32768.3::float union all select 27.3::float);
+-------------------+
| AVG(a)            |
+-------------------+
| 10889.13359451294 |
+-------------------+
1 row(s) fetched. 
Elapsed 0.011 seconds.

I expect those numbers to be the same. And that what duck db does

Originally posted by @comphead in #10634 (review)

@viirya
Copy link
Member

viirya commented May 28, 2024

When -128.2::float is treated as float64, there is some precision difference. It should not be related to average function.

> explain select avg(a) from (select -128.2::float a union all select 32768.3 union all select 27.3);
+---------------+--------------------------------------------------------+
| plan_type     | plan                                                   |
+---------------+--------------------------------------------------------+
| logical_plan  | Aggregate: groupBy=[[]], aggr=[[AVG(a)]]               |
|               |   Union                                                |
|               |     Projection: Float64(-128.1999969482422) AS a       |
|               |       EmptyRelation                                    |
|               |     Projection: Float64(32768.3) AS a                  |
|               |       EmptyRelation                                    |
|               |     Projection: Float64(27.3) AS a                     |
|               |       EmptyRelation                                    |
| physical_plan | AggregateExec: mode=Final, gby=[], aggr=[AVG(a)]       |
|               |   CoalescePartitionsExec                               |
|               |     AggregateExec: mode=Partial, gby=[], aggr=[AVG(a)] |
|               |       UnionExec                                        |
|               |         ProjectionExec: expr=[-128.1999969482422 as a] |
|               |           PlaceholderRowExec                           |
|               |         ProjectionExec: expr=[32768.3 as a]            |
|               |           PlaceholderRowExec                           |
|               |         ProjectionExec: expr=[27.3 as a]               |
|               |           PlaceholderRowExec                           |
|               |                                                        |
+---------------+--------------------------------------------------------+

@comphead
Copy link
Contributor Author

Thats weird.

> explain select -128.2::float union all select -128.2;
+---------------+-------------------------------------------------------------------+
| plan_type     | plan                                                              |
+---------------+-------------------------------------------------------------------+
| logical_plan  | Union                                                             |
|               |   Projection: Float64(-128.1999969482422) AS (- Float64(128.2))   |
|               |     EmptyRelation                                                 |
|               |   Projection: Float64(-128.2) AS (- Float64(128.2))               |
|               |     EmptyRelation                                                 |
| physical_plan | UnionExec                                                         |
|               |   ProjectionExec: expr=[-128.1999969482422 as (- Float64(128.2))] |
|               |     PlaceholderRowExec                                            |
|               |   ProjectionExec: expr=[-128.2 as (- Float64(128.2))]             |
|               |     PlaceholderRowExec                                            |
|               |                                                                   |
+---------------+-------------------------------------------------------------------+

So first values transforms into -128.1999969482422, but the second is intact

@comphead comphead changed the title UNION ALL and AVG returns unexpected result for floats UNION ALL not correctly projects floats May 28, 2024
@comphead comphead changed the title UNION ALL not correctly projects floats UNION ALL not correctly projects the floating numbers May 28, 2024
@goldmedal
Copy link
Contributor

take

@viirya
Copy link
Member

viirya commented May 29, 2024

Thats weird.

explain select -128.2::float union all select -128.2;

I think it is because the first -128.2 is read as float 32, then treated as float 64. I said "treated" because I don't see there is cast expression. It may be happened when DataFusion parses float literal and resolves its data type.

@goldmedal
Copy link
Contributor

I think it is because the first -128.2 is read as float 32, then treated as float 64. I said "treated" because I don't see there is cast expression. It may be happened when DataFusion parses float literal and resolves its data type.

Agreed. I also found it.

> explain select 128.2::float;
+---------------+------------------------------------------------+
| plan_type     | plan                                           |
+---------------+------------------------------------------------+
| logical_plan  | Projection: Float32(128.2) AS Float64(128.2)   |
|               |   EmptyRelation                                |
| physical_plan | ProjectionExec: expr=[128.2 as Float64(128.2)] |
|               |   PlaceholderRowExec                           |
|               |                                                |
+---------------+------------------------------------------------+
2 row(s) fetched. 
Elapsed 0.004 seconds.

> explain select 128.2;
+---------------+------------------------------------------------+
| plan_type     | plan                                           |
+---------------+------------------------------------------------+
| logical_plan  | Projection: Float64(128.2)                     |
|               |   EmptyRelation                                |
| physical_plan | ProjectionExec: expr=[128.2 as Float64(128.2)] |
|               |   PlaceholderRowExec                           |
|               |                                                |
+---------------+------------------------------------------------+
2 row(s) fetched. 
Elapsed 0.004 seconds.

> explain select 128.2::double;
+---------------+------------------------------------------------+
| plan_type     | plan                                           |
+---------------+------------------------------------------------+
| logical_plan  | Projection: Float64(128.2)                     |
|               |   EmptyRelation                                |
| physical_plan | ProjectionExec: expr=[128.2 as Float64(128.2)] |
|               |   PlaceholderRowExec                           |
|               |                                                |
+---------------+------------------------------------------------+
2 row(s) fetched. 
Elapsed 0.003 seconds.

> explain select -128.2 union all select -128.2::double;
+---------------+----------------------------------------------------+
| plan_type     | plan                                               |
+---------------+----------------------------------------------------+
| logical_plan  | Union                                              |
|               |   Projection: Float64(-128.2) AS Float64(-128.2)   |
|               |     EmptyRelation                                  |
|               |   Projection: Float64(-128.2) AS Float64(-128.2)   |
|               |     EmptyRelation                                  |
| physical_plan | UnionExec                                          |
|               |   ProjectionExec: expr=[-128.2 as Float64(-128.2)] |
|               |     PlaceholderRowExec                             |
|               |   ProjectionExec: expr=[-128.2 as Float64(-128.2)] |
|               |     PlaceholderRowExec                             |
|               |                                                    |
+---------------+----------------------------------------------------+
2 row(s) fetched. 
Elapsed 0.008 seconds.

> explain select -128.2 union all select -128.2::float;
+---------------+----------------------------------------------------------------+
| plan_type     | plan                                                           |
+---------------+----------------------------------------------------------------+
| logical_plan  | Union                                                          |
|               |   Projection: Float64(-128.2) AS Float64(-128.2)               |
|               |     EmptyRelation                                              |
|               |   Projection: Float64(-128.1999969482422) AS Float64(-128.2)   |
|               |     EmptyRelation                                              |
| physical_plan | UnionExec                                                      |
|               |   ProjectionExec: expr=[-128.2 as Float64(-128.2)]             |
|               |     PlaceholderRowExec                                         |
|               |   ProjectionExec: expr=[-128.1999969482422 as Float64(-128.2)] |
|               |     PlaceholderRowExec                                         |
|               |                                                                |
+---------------+----------------------------------------------------------------+
2 row(s) fetched. 
Elapsed 0.008 seconds.

If we cast 128.2 to double, the value won't be changed. I guess the reason is that DataFusion performs type coercion between float32 and float64 when combining two select plans.

@goldmedal
Copy link
Contributor

I wanted to share something I found. I tried to observe the plan transformation step by step:
sql_to_plan -> analyze -> optimize

SQL: select 128.2 union all select 128.2::float
********
Original LogicalPlan:
 Union
  Projection: Float64(128.2) AS Float64(128.2)
    EmptyRelation
  Projection: CAST(CAST(Float64(128.2) AS Float32) AS Float64) AS Float64(128.2)
    EmptyRelation
********
Do some analyze:
 Union
  Projection: Float64(128.2) AS Float64(128.2)
    EmptyRelation
  Projection: CAST(CAST(Float64(128.2) AS Float32) AS Float64) AS Float64(128.2)
    EmptyRelation
********
Do some optimize:
 Union
  Projection: Float64(128.2) AS Float64(128.2)
    EmptyRelation
  Projection: Float64(128.1999969482422) AS Float64(128.2)
    EmptyRelation
********

I found that CAST is removed after the plan is optimized. I'm not sure which optimization rule does this—maybe unwrap_cast_in_comparison.rs? I'm not sure. I'll keep tracking it tomorrow.

@goldmedal
Copy link
Contributor

After some testing, I found it could be an arrow-cast issue. When trying to cast a float32 to float64, the precision is missing.
We can easy to reproduce this case like:

> select (128.2::float)::double
;
+-------------------+
| Float64(128.2)    |
+-------------------+
| 128.1999969482422 |
+-------------------+
1 row(s) fetched. 
Elapsed 0.089 seconds.

I found it happens in SimplifyExpressions rule. SimpleExpr will try to evaluate the casting expression.

let cast_arr = cast_with_options(&self.to_array()?, data_type, &cast_options)?;

cast_to will invoke the arrow-cast to get the result.
It can also be reproduced like

    let expr2 = ScalarValue::Float64(Some(128.2))
    .cast_to(&DataType::Float32)
    .unwrap()
    .cast_to(&DataType::Float64)
    .unwrap();
    println!("expr2: {:?}", expr2);
------------------------------------
expr2: Float64(128.1999969482422)

It happens in arrow-cast-51.0.0/src/cast.rs (I'm not sure where is the code placed).

(Float16, Float32) => cast_numeric_arrays::<Float16Type, Float32Type>(array, cast_options),

I think it maybe not a DataFusion issue but a arrow-cast issue.

@comphead @viirya What do you think?

@comphead
Copy link
Contributor Author

good catch, we can check that

Even simple conversion gives weird result

        let expr2 = ScalarValue::Float32(Some(128.2))
            .cast_to(&DataType::Float64)
            .unwrap()
            ;
        println!("expr2: {:?}", expr2);

        expr2: Float64(128.1999969482422)

Opposite conversion is ok though

        let expr2 = ScalarValue::Float64(Some(128.2))
            .cast_to(&DataType::Float32)
            .unwrap();
        println!("expr2: {:?}", expr2);

        expr2: Float32(128.2)

For me looks like a conversion issue, especially strange its happening when we convert into larger precision.

@goldmedal
Copy link
Contributor

Indeed, it's a conversion issue. I tried to dig into the root cause and found that arrow-cast uses the crate num to handle casting problems. I tried to use it directly:

    let value_32: f32 = 128.2;
    println!("value_32: {:?}", value_32);
    let value_64: f64 = num::cast(value_32).unwrap();
    println!("value_64: {:?}", value_64);
-------------------
value_32: 128.2
value_64: 128.1999969482422

I have no idea how to fix it in DataFusion. Maybe we need to go back to the crate num to fix this behavior?
@comphead WDYT?

@comphead
Copy link
Contributor Author

Well, that might be some representation issue.

    println!("value_64: {:?}", 128.2_f32 as f64);

@comphead
Copy link
Contributor Author

Filed rust-lang/rust#125777

@comphead
Copy link
Contributor Author

comphead commented Jun 3, 2024

So this looks like a representation hm... feature, it can be hard to explain for users but it is float output semantics

@comphead comphead closed this as not planned Won't fix, can't repro, duplicate, stale Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants