Enable querying policy-enabled table in MSQ, and use RestrictedDataSource as a base in DataSourceAnalysis. #17666

cecemei · 2025-01-25T00:43:25Z

Description

This PR enables querying policy-enabled table in MSQ.

Key changed/added classes in this PR

DataSourceAnalysis, getBaseTableDataSource can now return the base of RestrictedDataSource. This is a more robust solution than using the underlying table as base.
MSQTaskQueryMaker would add policies to the query, instead of throw permission error.
DataSourcePlan can handle RestrictedDataSource.
a new class RestrictedInputNumberDataSource, which basically wraps a NumberDataSource with a policy, and its SegmentMapFn can be used to create a RestrictedSegment.
RunWorkOrder, try to make a few refactors to make the code clear, no behavior change. ShufflePipelineBuilder.build(), it was not clear before that the channel future should only be returned when the resultFuture is ready. Also, the sanity check is moved to OutputChannels.
Added tests in MSQSelectTest, MSQReplaceTest, MSQInsertTest, MSQExportTest.

This PR has:

…le policy restriction in MSQ.

clintropolis

had a first pass and have some questions and thoughts.

Also, maybe you could try to avoid reformatting entire files, all of these unrelated formatting changes make review harder than it should be. I know its just the tooling doing it to adhere to the style stuff, but my preference at least would be to do these cosmetic changes as you notice them as standalone PR to keep reviews simple.

extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/RunWorkOrder.java

clintropolis · 2025-01-29T23:21:03Z

...ore/multi-stage-query/src/main/java/org/apache/druid/msq/indexing/MSQWorkerTaskLauncher.java

-      workerToTaskIds.compute(i, (workerId, taskIds) -> {
-        if (taskIds == null) {
-          taskIds = new ArrayList<>();
-        }
-        taskIds.add(task.getId());
-        return taskIds;
-      });
+      workerToTaskIds.computeIfAbsent(i, (unused) -> (new ArrayList<>())).add(task.getId());


this isn't equivalent, previously it would always add the taskId to the worker, now it only adds if the worker isn't there, is that ok?

Yeah this looks weird.

...ge-query/src/main/java/org/apache/druid/msq/querykit/BroadcastJoinSegmentMapFnProcessor.java

clintropolis · 2025-01-29T23:25:53Z

...stage-query/src/main/java/org/apache/druid/msq/querykit/RestrictedInputNumberDataSource.java

+ * join tree.
+ */
+@JsonTypeName("restrictedInputNumber")
+public class RestrictedInputNumberDataSource implements DataSource


should this be InputNumberRestrictedDataSource instead?

it's a preference, but i kinda feel RestrictedInputNumberDataSource emphasis more on the Restricted part

sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java

...i-stage-query/src/main/java/org/apache/druid/msq/querykit/BaseLeafFrameProcessorFactory.java

extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/RunWorkOrder.java

processing/src/main/java/org/apache/druid/frame/processor/OutputChannels.java

cryptoe

Could you please have the formatting/refactors as part of another PR. The PR is quite hard to review with all the refactors and the formatting changes.

extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/RunWorkOrder.java

...i-stage-query/src/main/java/org/apache/druid/msq/querykit/BaseLeafFrameProcessorFactory.java

cryptoe · 2025-02-10T15:30:01Z

...ore/multi-stage-query/src/main/java/org/apache/druid/msq/indexing/MSQWorkerTaskLauncher.java

-      workerToTaskIds.compute(i, (workerId, taskIds) -> {
-        if (taskIds == null) {
-          taskIds = new ArrayList<>();
-        }
-        taskIds.add(task.getId());
-        return taskIds;
-      });
+      workerToTaskIds.computeIfAbsent(i, (unused) -> (new ArrayList<>())).add(task.getId());


Yeah this looks weird.

...sions-core/multi-stage-query/src/main/java/org/apache/druid/msq/querykit/DataSourcePlan.java

cryptoe · 2025-02-10T15:31:51Z

...stage-query/src/main/java/org/apache/druid/msq/querykit/RestrictedInputNumberDataSource.java

+public class RestrictedInputNumberDataSource implements DataSource
+{
+  private final int inputNumber;
+  private final Policy policy;


This should be a list of policy no ?

We agreed on a single policy approach for datasource, full discussion here: #17564 (comment)

extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/exec/MSQSelectTest.java

cryptoe · 2025-02-10T15:41:24Z

processing/src/main/java/org/apache/druid/query/JoinDataSource.java

@@ -571,9 +570,6 @@ private static Triple<DataSource, DimFilter, List<PreJoinableClause>> flattenJoi
      } else if (current instanceof UnnestDataSource) {
        final UnnestDataSource unnestDataSource = (UnnestDataSource) current;
        current = unnestDataSource.getBase();
-      } else if (current instanceof RestrictedDataSource) {


Could you please explain the removal here ?

This was added in first security pr, in that pr, the TableDataSource is used as a base in DataSourceAnalysis. But that's not a good approach, security filter might get lost in withUpdatedDataSource. In this new pr, RestrictedDataSource is used as a base, this guarantees the security filter stays in place

sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java

extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/sql/MSQTaskQueryMaker.java

kgyrtkirk · 2025-02-12T15:19:04Z

...sions-core/multi-stage-query/src/main/java/org/apache/druid/msq/querykit/DataSourcePlan.java

+    return new DataSourcePlan(
+        (broadcast && dataSource.isGlobal())
+        ? dataSource
+        : new RestrictedInputNumberDataSource(0, dataSource.getPolicy()),


I wonder if its really necessary to delay the policy evaluation up until the point the cursor is created?
if so - wouldn't it be an option to just save the required details to interpret the policy fully insted of forcing it to be the 1st parent of TableDataSource - because that requires to work around things which should be working already and need classes which are more-or-less just copies of others.