Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added logic to control which stages must be pushed down or skipped from SQL engine execution. #14061

Merged
merged 1 commit into from
Feb 26, 2022

Conversation

fernst
Copy link
Member

@fernst fernst commented Feb 24, 2022

Added additional methods to ensure that input and output schemas for stages are supported by the SQL engine. This is useful when verifying if all input and output types in an aggregation are supported by the SQL engine.

@gitpod-io
Copy link

gitpod-io bot commented Feb 24, 2022

@fernst fernst force-pushed the add-options-to-control-sql-engine-stages branch from 6faf269 to d89961f Compare February 24, 2022 02:15
Copy link
Contributor

@tivv tivv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests

protected Iterable<SparkCollectionRelationalEngine> getRelationalEngines(SparkCollection<Object> stageData) {
if (sqlEngineAdapter == null || !sqlEngineAdapter.supportsRelationalTranform()
|| !(stageData instanceof SQLBackedCollection)) {
protected Iterable<SparkCollectionRelationalEngine> getRelationalEngines(String stageName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there cases where stageSpec.getName() != stageName? Please add new params to javadoc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually no, there are not. Let me update the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated


// If this stage is not pushed down and it's not in the included stages, we can skip relational transformation
// in the SQL engine.
if (!(stageData instanceof SQLBackedCollection || shouldForcePushToSQLEngine(stageName))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I am wondering if it would be more readable as !A && !B vs !(A || B). It would more directly correspond to the comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@fernst fernst force-pushed the add-options-to-control-sql-engine-stages branch from d89961f to 8bfd279 Compare February 26, 2022 03:28
@fernst fernst requested a review from tivv February 26, 2022 03:28
@fernst
Copy link
Member Author

fernst commented Feb 26, 2022

@tivv I've added tests.

@fernst fernst force-pushed the add-options-to-control-sql-engine-stages branch from 8bfd279 to 7cef84c Compare February 26, 2022 03:33
Copy link
Contributor

@tivv tivv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall good, could you please also support "excluded" for io.cdap.cdap.etl.spark.batch.SQLBackedCollection#tryStoreDirect ? It would allow to skiip it in case of any issues

…om SQL engine execution.

Added additional methods to ensure that input and output schemas for stages are supported by the SQL engine. This is useful when verifying if all input and output types in an aggregation are supported by the SQL engine.
@fernst fernst force-pushed the add-options-to-control-sql-engine-stages branch from 7cef84c to 3ee9301 Compare February 26, 2022 06:03
@fernst
Copy link
Member Author

fernst commented Feb 26, 2022

Overall good, could you please also support "excluded" for io.cdap.cdap.etl.spark.batch.SQLBackedCollection#tryStoreDirect ? It would allow to skip it in case of any issues

I've added logic to check if the sink stage is in the "Excluded" list before attempting to write to the sink.

@fernst fernst requested a review from tivv February 26, 2022 06:04
@fernst fernst added the build Triggers github actions build label Feb 26, 2022
@fernst fernst merged commit 5285268 into develop Feb 26, 2022
@fernst fernst deleted the add-options-to-control-sql-engine-stages branch February 26, 2022 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bq-pushdown build Triggers github actions build
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants