Skip to content

prepare planner for query rewrites #3321

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

normen662
Copy link
Contributor

@normen662 normen662 commented Apr 23, 2025

This PR prepares the Cascades Planner for multiple planner stages. As only the last such stage produces RecordQueryPlan, we need to remove every explicit reference to RecordQueryPlan from the innards of the planner. This PR specifically changes the PropertiesMap that uses RecordQueryPlan with a generic version that can either use RelationalExpression or RecordQueryPlan based on which planner stage we might be in.
The second big change concerns the way properties are maintained and how state management works for visitors that are used by properties. Currently, there are properties which can be computed on plans only and others which can be computed on any kind of expression. There were also differences in how properties managed their state between different properties. The following changes were made:

  1. properties are singletons (or multitons; enumerated singletons)
  2. properties can create a visitor using createVisitor()
  3. all properties use visitors that work on all kinds of expressions
  4. properties that are not meaningful for all kinds of expressions (plans vs all expressions) can refuse to return a proper result for those (by throwing a RecordCoreException)
  5. see SimpleExpressionVisitor and ExpressionProperty.toExpressionVisitor(..) for details on how to transform a plan visitor to a generic expression visitor and how to transform an old-style simplistic expression visitor.
  6. previous classes called ...Property are not properties anymore if they cannot be expressed using 1. and 2. Those properties have been demoted to visitors

@normen662 normen662 requested review from alecgrieser and hatyo April 23, 2025 16:31
@normen662 normen662 added the enhancement New feature or request label Apr 24, 2025
Copy link
Collaborator

@alecgrieser alecgrieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things. I was also looking at the coverage information (here: https://github.com/FoundationDB/fdb-record-layer/actions/runs/14622927029/attempts/1#summary-41029106449 I also needed to download the coverage-report for more details). There are certainly a few coverage gaps highlighted here, mainly in some of the rule implementations, which isn't amazing given the nature of the refactoring, but maybe it's something we can live with, as long as your pretty confident that the methods were just moved. Some of the things highlighted are just that the new evaluate method taking a Reference aren't called, which...is probably fine, I guess. For the plans, the main thing that coverage is highlighting is that some of the rules are never called, but I'm not particularly concerned with the rule changes per se, as we can be pretty confident they are correct as long as the underlying property changes are correct

import java.util.Objects;

/**
* An interface for certain Cascades-style properties, which are measurable features of an expression other than the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Javadoc seems to be copied over from ExpressionProperty (presumably, it came along when that file was renamed), and I think it's mostly out of date. I think the good news is that this is a more straightforward class, and it probably doesn't need that much documentation. It may be a good idea to point out that ExpressionProperty implementations (often) use this kind of visitor when evaluated.

It looks like the javadoc on the methods here also reference property in a way that we probably want to update.

* @param <T> the result type of the property
*/
@API(API.Status.EXPERIMENTAL)
public interface SimpleExpressionVisitor<T> extends RelationalExpressionVisitorWithDefaults<T> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the rename, are there any changes here? Comparing with the old ExpressionProperty class, it didn't seem like it, but I wanted to be sure. I guess it now extends RelationalExpressionVisitorWithDefaults instead of RelationalExpressionVisitor. I guess that's fine, other than that it may make it more likely that someone who adds a RelationalExpression type forgets to update the appropriate visitor implementations.

*
* @param <T> the result type of the property
* Base interface to capture attributes for expressions.
* An instance of this usually class serves as a key in maps much like an enum, but provides strong typing.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth copying over some of the description of what a property is from the PR description into this class.

* <br>
* Properties for plans managed by this map are computed lazily when a caller attempts to retrieve the value of a property.
* The reason for that is twofold. First, we want to avoid unnecessary computation of a property if it is not retrieved
* at a later point in time. Second, the basic planner {@link RecordQueryPlan} uses a simplified way of creating a dag of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* at a later point in time. Second, the basic planner {@link RecordQueryPlan} uses a simplified way of creating a dag of
* at a later point in time. Second, the basic planner ({@link RecordQueryPlanner}) uses a simplified way of creating a DAG of

This is mainly about the RecordQueryPlan to RecordQueryPlanner, which I think is what you meant here. It may be that the Javadoc needs the fully qualified name there to compile (unless the imports are also changed).

I could also be convinced that this is now mostly duplicative of the javadoc for ExpressionPropertiesMap, and this whole description should be simplified to something like, "An {@link ExpressionPropertiesMap} for properties that only apply to {@link RecordQueryPlan}s", and then we add @see ExpresssionPropertiesMap to the bottom of the description.

public static class NormalizedResidualPredicateVisitor implements SimpleExpressionVisitor<QueryPredicate> {
@Nonnull
@Override
public QueryPredicate visitRecordQueryUnionOnValuesPlan(@Nonnull final RecordQueryUnionOnValuesPlan unionPlan) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unchanged, but this doesn't have a special case for LogicalUnionExpression. But I also take it that this predicate may not make sense on logical expressions, in general, as evaluateAtRef expects the number of variations in the reference to be one anyway. If we still have some properties that only apply to plans, I could see that suggesting that there is some benefit to a PlanProperty--but I'm not sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only called from the cost model, thus it only needs to work over references containing exactly one RecordQueryPlan.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. That checks out, though it would be nice if we could encode that in some way. I see that we're already asserting that the reference only has one member in evaluateAtRef, but it sort of seems like we're missing some kind of abstraction, like a PlanOnlyVisitor abstraction or something that encodes that kind of information. That doesn't need to be this PR.

@Override
public Set<String> evaluateAtExpression(@Nonnull RelationalExpression expression, @Nonnull List<Set<String>> childResults) {
// shouldVisit() ensures that we only visit relational planner expressions
// If we mess this up, better to find out sooner rather than later.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was there in the old code, but I'm not sure I understand it. What does it mean by "relational" planner expression here? That is, does it think there's some kind of RelationalExpression that is not a "relational planner expression"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make this explanation better. In short, there is a kind of visitor (which does not really follow the strict visitor pattern and which predates the visitors that extend the code-generated visitors) that allows you to traverse the graph while skipping subgraphs. This is what the shouldVisit() indicates. If shouldVisit() is true, this method must return something useful.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, an improved explanation here would be nice to have. It's a little awkward, given that shouldVisit appears to always return true for everything, if my code searching is correct. Maybe we should just remove it? (Not this PR)

// shouldVisit() ensures that we only visit relational planner expressions
// If we mess this up, better to find out sooner rather than later.

if (expression instanceof RecordQueryScanPlan) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type ladder is also a bit weird (instead of using the various visitX methods). It's not something that changed, I suppose, other than placing it in a new place. There is some advantage to this approach, in that it ensures that the traversal is post-order, I guess, rather than having expressions with children need to traverse down their children manually

@@ -82,7 +82,7 @@ public void onMatch(@Nonnull final CascadesRuleCall call) {
final var unsatisfiedMapBuilder = ImmutableMultimap.<Set<String>, RecordQueryPlan>builder();

for (final var innerPlan : planPartition.getPlans()) {
final var childRecordTypes = RecordTypesProperty.evaluate(call.newAliasResolver(), innerPlan);
final var childRecordTypes = new RecordTypesVisitor(call.newAliasResolver()).visit(innerPlan);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this is constructing a new ReocrdTypesVisitor directly instead of evaluating the property on the inner plan? Is it just the alias resolver? And is that actually needed for the property evaluation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The property cannot be used as the visitor needs an argument.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's part of my question: does this really need the argument?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I guess I'm fine with leaving the argument in under the doctrine of making minimal changes

* </p>
*/
@API(API.Status.EXPERIMENTAL)
public class ExplainPlanProperty implements ExpressionProperty<ExplainTokens> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not too much work, I think undoing this to keep explain visitor based is a good idea. I think it keeps the property/visitor distinction that you're introducing here a little cleaner

public static class NormalizedResidualPredicateVisitor implements SimpleExpressionVisitor<QueryPredicate> {
@Nonnull
@Override
public QueryPredicate visitRecordQueryUnionOnValuesPlan(@Nonnull final RecordQueryUnionOnValuesPlan unionPlan) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. That checks out, though it would be nice if we could encode that in some way. I see that we're already asserting that the reference only has one member in evaluateAtRef, but it sort of seems like we're missing some kind of abstraction, like a PlanOnlyVisitor abstraction or something that encodes that kind of information. That doesn't need to be this PR.

@@ -82,7 +82,7 @@ public void onMatch(@Nonnull final CascadesRuleCall call) {
final var unsatisfiedMapBuilder = ImmutableMultimap.<Set<String>, RecordQueryPlan>builder();

for (final var innerPlan : planPartition.getPlans()) {
final var childRecordTypes = RecordTypesProperty.evaluate(call.newAliasResolver(), innerPlan);
final var childRecordTypes = new RecordTypesVisitor(call.newAliasResolver()).visit(innerPlan);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's part of my question: does this really need the argument?

@Override
public Set<String> evaluateAtExpression(@Nonnull RelationalExpression expression, @Nonnull List<Set<String>> childResults) {
// shouldVisit() ensures that we only visit relational planner expressions
// If we mess this up, better to find out sooner rather than later.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, an improved explanation here would be nice to have. It's a little awkward, given that shouldVisit appears to always return true for everything, if my code searching is correct. Maybe we should just remove it? (Not this PR)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants