Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable querying policy-enabled table in MSQ, and use RestrictedDataSource as a base in DataSourceAnalysis. #17666

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

cecemei
Copy link
Contributor

@cecemei cecemei commented Jan 25, 2025

Description

This PR enables querying policy-enabled table in MSQ.


Key changed/added classes in this PR
  • DataSourceAnalysis, getBaseTableDataSource can now return the base of RestrictedDataSource. This is a more robust solution than using the underlying table as base.
  • MSQTaskQueryMaker would add policies to the query, instead of throw permission error.
  • DataSourcePlan can handle RestrictedDataSource.
  • a new class RestrictedInputNumberDataSource, which basically wraps a NumberDataSource with a policy, and its SegmentMapFn can be used to create a RestrictedSegment.
  • RunWorkOrder, try to make a few refactors to make the code clear, no behavior change. ShufflePipelineBuilder.build(), it was not clear before that the channel future should only be returned when the resultFuture is ready. Also, the sanity check is moved to OutputChannels.
  • Added tests in MSQSelectTest, MSQReplaceTest, MSQInsertTest, MSQExportTest.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@github-actions github-actions bot added Area - Batch Ingestion Area - Querying Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Jan 25, 2025
@cecemei cecemei changed the title Enable RestrictedDataSource as a base in DataSourceAnalysis, and enab… Enable querying policy-enabled table in MSQ, and use RestrictedDataSource as a base in DataSourceAnalysis. Jan 25, 2025
@cecemei cecemei marked this pull request as ready for review January 28, 2025 04:08
Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had a first pass and have some questions and thoughts.

Also, maybe you could try to avoid reformatting entire files, all of these unrelated formatting changes make review harder than it should be. I know its just the tooling doing it to adhere to the style stuff, but my preference at least would be to do these cosmetic changes as you notice them as standalone PR to keep reviews simple.

workerToTaskIds.compute(i, (workerId, taskIds) -> {
if (taskIds == null) {
taskIds = new ArrayList<>();
}
taskIds.add(task.getId());
return taskIds;
});
workerToTaskIds.computeIfAbsent(i, (unused) -> (new ArrayList<>())).add(task.getId());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't equivalent, previously it would always add the taskId to the worker, now it only adds if the worker isn't there, is that ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this looks weird.

* join tree.
*/
@JsonTypeName("restrictedInputNumber")
public class RestrictedInputNumberDataSource implements DataSource
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be InputNumberRestrictedDataSource instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a preference, but i kinda feel RestrictedInputNumberDataSource emphasis more on the Restricted part

Copy link
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please have the formatting/refactors as part of another PR. The PR is quite hard to review with all the refactors and the formatting changes.

workerToTaskIds.compute(i, (workerId, taskIds) -> {
if (taskIds == null) {
taskIds = new ArrayList<>();
}
taskIds.add(task.getId());
return taskIds;
});
workerToTaskIds.computeIfAbsent(i, (unused) -> (new ArrayList<>())).add(task.getId());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this looks weird.

public class RestrictedInputNumberDataSource implements DataSource
{
private final int inputNumber;
private final Policy policy;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a list of policy no ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agreed on a single policy approach for datasource, full discussion here: #17564 (comment)

@@ -571,9 +570,6 @@ private static Triple<DataSource, DimFilter, List<PreJoinableClause>> flattenJoi
} else if (current instanceof UnnestDataSource) {
final UnnestDataSource unnestDataSource = (UnnestDataSource) current;
current = unnestDataSource.getBase();
} else if (current instanceof RestrictedDataSource) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain the removal here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added in first security pr, in that pr, the TableDataSource is used as a base in DataSourceAnalysis. But that's not a good approach, security filter might get lost in withUpdatedDataSource. In this new pr, RestrictedDataSource is used as a base, this guarantees the security filter stays in place

return new DataSourcePlan(
(broadcast && dataSource.isGlobal())
? dataSource
: new RestrictedInputNumberDataSource(0, dataSource.getPolicy()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if its really necessary to delay the policy evaluation up until the point the cursor is created?
if so - wouldn't it be an option to just save the required details to interpret the policy fully insted of forcing it to be the 1st parent of TableDataSource - because that requires to work around things which should be working already and need classes which are more-or-less just copies of others.

@@ -1603,6 +1606,39 @@
.verifyResults();
}

@MethodSource("data")
@ParameterizedTest(name = "{index}:with context {0}")
public void testInsertOnRestricted(String contextName, Map<String, Object> context)

Check notice

Code scanning / CodeQL

Useless parameter Note test

The parameter 'contextName' is never used.
@@ -704,6 +707,81 @@
.verifyResults();
}

@MethodSource("data")
@ParameterizedTest(name = "{index}:with context {0}")
public void testReplaceOnRestricted(String contextName, Map<String, Object> context)

Check notice

Code scanning / CodeQL

Useless parameter Note test

The parameter 'contextName' is never used.

@MethodSource("data")
@ParameterizedTest(name = "{index}:with context {0}")
public void testExport(String unusedContextName, Map<String, Object> context) throws IOException

Check notice

Code scanning / CodeQL

Useless parameter Note test

The parameter 'unusedContextName' is never used.
public void testExport2() throws IOException
@MethodSource("data")
@ParameterizedTest(name = "{index}:with context {0}")
public void testExport2(String unusedContextName, Map<String, Object> context) throws IOException

Check notice

Code scanning / CodeQL

Useless parameter Note test

The parameter 'unusedContextName' is never used.
public void testNumberOfRowsPerFile()
@MethodSource("data")
@ParameterizedTest(name = "{index}:with context {0}")
public void testExportRestricted(String unusedContextName, Map<String, Object> context) throws IOException

Check notice

Code scanning / CodeQL

Useless parameter Note test

The parameter 'unusedContextName' is never used.

@MethodSource("data")
@ParameterizedTest(name = "{index}:with context {0}")
public void testNumberOfRowsPerFile(String unusedContextName, Map<String, Object> context) throws IOException

Check notice

Code scanning / CodeQL

Useless parameter Note test

The parameter 'unusedContextName' is never used.
void testExportComplexColumns() throws IOException
@MethodSource("data")
@ParameterizedTest(name = "{index}:with context {0}")
public void testExportComplexColumns(String unusedContextName, Map<String, Object> context) throws IOException

Check notice

Code scanning / CodeQL

Useless parameter Note test

The parameter 'unusedContextName' is never used.
void testExportSketchColumns() throws IOException
@MethodSource("data")
@ParameterizedTest(name = "{index}:with context {0}")
public void testExportSketchColumns(String unusedContextName, Map<String, Object> context) throws IOException

Check notice

Code scanning / CodeQL

Useless parameter Note test

The parameter 'unusedContextName' is never used.

@MethodSource("data")
@ParameterizedTest(name = "{index}:with context {0}")
public void testEmptyExport(String unusedContextName, Map<String, Object> context) throws IOException

Check notice

Code scanning / CodeQL

Useless parameter Note test

The parameter 'unusedContextName' is never used.

@MethodSource("data")
@ParameterizedTest(name = "{index}:with context {0}")
public void testExportWithLimit(String unusedContextName, Map<String, Object> context) throws IOException

Check notice

Code scanning / CodeQL

Useless parameter Note test

The parameter 'unusedContextName' is never used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Querying
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants