Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter pushdown issue with aggregating inline views and CTE's #640

Open
ralfbecher opened this issue May 13, 2020 · 5 comments
Open

Filter pushdown issue with aggregating inline views and CTE's #640

ralfbecher opened this issue May 13, 2020 · 5 comments
Labels
backend:server Issues relating to Cube Core's Server enhancement New feature proposal

Comments

@ralfbecher
Copy link

ralfbecher commented May 13, 2020

Filters are applied to the outer SELECT. This cannot work with aggregating inline views and CTE's (common table expressions if dimensionality is reduced. With FILTER_PARAMS we can push down the filter inside the inline view or CTS like this:

Having this simple example with a transaction table:

CREATE TABLE food_sales (
        CREATED_TS      timestamp PRIMARY KEY,
        ITEM            string,
        AMOUNT          integer
);

and a Cube defined for aggregating top-5 with all others (typical pie chart) we want to filter for AMOUNT < n (eg. finding top-5 items in a lower segment):

cube(`FoodSales_Aggr_Top5`, {
  sql: `WITH top5 AS (
      SELECT ITEM, sum(AMOUNT) AS AMOUNT_SUM
      FROM food_sales
      WHERE ${FILTER_PARAMS.FoodSales_Aggr_Top5.amount.filter('AMOUNT')}
      GROUP BY ITEM
      ORDER BY AMOUNT_SUM DESC 
      LIMIT 5
    ) 
    SELECT * FROM top5
    UNION ALL
    SELECT 'Others' as ITEM, sum(AMOUNT) AS AMOUNT_SUM
    FROM food_sales
    WHERE ${FILTER_PARAMS.FoodSales_Aggr_Top5.amount.filter('AMOUNT')} 
    AND ITEM NOT IN (SELECT ITEM FROM top5)`,
  dataSource: `default`,

  title: `Food Sales Top 5`,
  description: `Food Sales aggregated Top5 sales`,

  dimensions: {
    created: {
      sql: 'CREATED_TS',
      type: 'time',
      primaryKey: true,
      shown: false
    },
    item: {
      sql: `ITEM`,
      type: `string`,
    },
    amount: {
      sql: `AMOUNT`,
      type: `number`,
      shown: false
    }
  },

  measures: {
    sumAmount: {
      sql: `AMOUNT_SUM`,
      type: `sum`
    }

  }
});

Now we define this query with the filter AMOUNT < 1000:

{
  measures: [
    'FoodSales_Aggr_Top5.sumAmount'
  ],
  dimensions: [
    'FoodSales_Aggr_Top5.item'
  ],
  filters: [{
    member: 'FoodSales_Aggr_Top5.amount',
    operator: 'lt',
    values: ['1000']
  }]
}

This is the generated SQL:

SELECT  "food_sales__aggr__top5".ITEM "food_sales__aggr__top5__item",
        sum("food_sales__aggr__top5".AMOUNT_SUM) "food_sales__aggr__top5__sum_amount"
FROM (
        WITH top5 AS (
                SELECT  ITEM,
                        sum(AMOUNT) AS AMOUNT_SUM
                FROM food_sales 
                WHERE AMOUNT < ?
                GROUP BY ITEM
                ORDER BY AMOUNT_SUM DESC
                LIMIT 5)
        SELECT * FROM top5
        UNION ALL
        SELECT  'Others' as ITEM,
                sum(AMOUNT) AS AMOUNT_SUM
        FROM food_sales
        WHERE AMOUNT < ? AND ITEM NOT IN (SELECT ITEM FROM top5)
) AS "food_sales__aggr__top5"
WHERE ("food_sales__aggr__top5".AMOUNT < ?)
GROUP BY 1
ORDER BY 2 DESC

As we see we have an additional WHERE clause on the outer SELECT where we do not have the AMOUNT field because of aggregation.

My expectation is that this is not needed here because we have explicitly used FILTER_PARAMS for this field inside the SQL. Also, I would expect the field AMOUNT would not appear in the outer SELECT since I set shown: false in the Cube definition.

Maybe there is already a solution to such kind of queries and I haven't found it yet?

@paveltiunov
Copy link
Member

@ralfbecher Hey Ralf! That's a great question! As of now the best way to approach it is with multiple queries:
For top items

{
  measures: [
    'FoodSales.sumAmount'
  ],
  dimensions: [
    'FoodSales.item'
  ],
  order: {
    'FoodSales.sumAmount': 'desc'
  },
  limit: 5
}

And then for others

{
  measures: [
    'FoodSales.sumAmount'
  ],
  filters: [{
    dimensions: 'FoodSales.item',
    operator: 'notEquals',
    values: [...topItems]
  }]
}

It isn't obvious but those 2 separate queries usually faster than a big one and also easier to maintain. We're considering to make these "other" queries as part of an API.

Let's use this issue to track this API enhancement.

@paveltiunov paveltiunov added the enhancement New feature proposal label May 13, 2020
@ralfbecher
Copy link
Author

@paveltiunov Hi Pavel, thanks for your suggestion. I also had in mind to decompose the SQL but then we run into a pipeline approach with longer roundtrip times.

However, my intention here was not to solve the top-5 "other" problem (it was just one example), and I would appreciate any solution in Cube.js for that, but to address the filter pushdown (predicate pushdown) inside the inner SELECT statement(s) or CTEs especially for aggregation queries.

This means to have a control where a WHERE clause is added. And I think if we explicitly use FILTER_PARAMS these filters should not be added as WHERE clause to the enclosing outer SELECT statement, or we should be able to control this by a filter or dimension property.

@paveltiunov
Copy link
Member

@ralfbecher Could you please provide exact SQL query you need to reproduce then?

@ralfbecher
Copy link
Author

@paveltiunov Let's have an inline view with aggregation where I want to filter on a criteria which is not part of the dimensionality (where amount > 1000, or where country = 'UK'):

SELECT 
  item, 
  week, 
  amount_sum - lag(amount_sum) over(PARTITION BY item ORDER BY week) as amount_diff
FROM  (
  SELECT 
    item, 
    WEEK(created_ts) as week, 
    sum(amount) as amount_sum
  FROM food_sales
  WHERE amount > 1000
  GROUP BY 
    item, 
    week
)

@hassankhan hassankhan added the backend:server Issues relating to Cube Core's Server label May 5, 2021
@rccoe
Copy link
Contributor

rccoe commented Dec 14, 2021

Seconding this request - there should be an ability to pass filters into the query that are not also added to the WHERE clause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:server Issues relating to Cube Core's Server enhancement New feature proposal
Projects
None yet
Development

No branches or pull requests

4 participants