-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement function inlining. #7713
Conversation
6dbdee8
to
fe66209
Compare
I think it would be useful to try a 2-argument function as part of your early tests |
72c96d7
to
0a659e2
Compare
11c9d17
to
7e61c76
Compare
0a659e2
to
000c220
Compare
edb/edgeql/compiler/stmtctx.py
Outdated
env = context.Environment( | ||
schema=schema, | ||
options=options, | ||
path_scope=( | ||
inlining_context.path_scope | ||
if inlining_context is not None else | ||
None | ||
), | ||
alias_result_view_name=options.result_view_name, | ||
scope_tree_nodes=( | ||
inlining_context.env.scope_tree_nodes | ||
if inlining_context is not None else | ||
None | ||
), | ||
) | ||
ctx = context.ContextLevel( | ||
None, | ||
context.ContextSwitchMode.NEW, | ||
env=env, | ||
aliases=( | ||
inlining_context.aliases | ||
if inlining_context is not None else | ||
None | ||
), | ||
scope_id_ctr=( | ||
inlining_context.scope_id_ctr | ||
if inlining_context is not None else | ||
None | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use the existing context, with appropriate new scopes applied to it.
Doing it in a completely separate context is not going to work at all in anything but the most simple situations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And especially the same environment. (Or at least, an almost identical environment, since we might need to change options or something, though that is a little fraught too.)
edb/ir/ast.py
Outdated
class InlinedParameter(ImmutableBase): | ||
|
||
# Parameter is used for both function parameters and runtime parameters. | ||
# If a function is inlined in compile_FunctionCall, this is used to | ||
# specify the function's parameters. | ||
|
||
# During function compilation, arguments are converted from names to either | ||
# int for positional argument or str for named argument. | ||
arg_key: int | str | ||
|
||
required: bool | ||
is_global: bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more typical way to do something like this in our compiler would probably be to substitute in irast.Set
objects that have the same path_id
as as the arguments but have an irast.RefExpr
as their expr
, to indicate that they are bound somewhere.
They should then get picked up by their path id when doing the backend compilation, without needing to add an extra map.
7e61c76
to
77b42b2
Compare
3be4bf1
to
52c32b1
Compare
67f672b
to
bf311de
Compare
1783407
to
3598bec
Compare
37ee25c
to
b85e67a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks pretty good.
I think we need to share more of env and ctx when compiling the function body, though.
I think there are some things that are being done mostly to workaround not sharing enough state, and a bunch more will come up when doing DML or when writing other sorts of tests.
edb/edgeql/compiler/func.py
Outdated
context=sd.CommandContext( | ||
# Probably not correct. Need to store modaliases while compiling | ||
# functions? | ||
modaliases=ctx.modaliases, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modaliases shouldn't be inherited by the function body
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
though it also doesn't matter, since functions are normalized into only having fully qualified names
edb/edgeql/compiler/func.py
Outdated
optional=arg.param_typemod == ft.TypeModifier.OptionalType, | ||
optional=( | ||
arg.param_typemod == ft.TypeModifier.OptionalType | ||
or arg.is_default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm.... Is adding arg.is_default
fully necessary here? I think in some cases it might result in worse codegen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a consequence of how has_inlined_defaults
interacted with the rest of the inlining. The simplest solution seems to be to just disable inlining defaults when inlining the function body.
edb/edgeql/compiler/func.py
Outdated
else: | ||
return arg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this case for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default values get added to inlined_args
as a irast.Set
instead of CallArg
. In that case, the default value is inlined directly. This should work because defaults are always constants.
class ArgumentInliner(ast.NodeTransformer): | ||
|
||
mapped_args: dict[irast.PathId, irast.PathId] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize now that it should be possible (and probably nicer) to put the inlined arguments in the context somehow while compiling the inlined function, and inserting them directly when compiling Parameters, rather then substituting them in after the fact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to not go down that path since the backend compiler does stuff with args too. Probably possible though.
if ir.body is not None: | ||
body_volatility = infer_volatility(ir.body, env=env) | ||
if body_volatility > ir.volatility: | ||
raise errors.QueryError( | ||
f'inline function body expression is {body_volatility} ' | ||
f'while the function is declared as {ir.volatility}', | ||
span=ir.span, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to understand while this is happening better. (Though it's not super critical)
edb/pgsql/compiler/relgen.py
Outdated
if pathctx.maybe_get_path_var( | ||
stmt, prefix_path_id, aspect=pgce.PathAspect.VALUE, env=ctx.env | ||
) is None: | ||
assert isinstance(source_rvar.query, pgast.Query) | ||
# When inlining functions, the prefix path will be the argument | ||
# path id, while ir_source.path_id will be the "correct" path id. | ||
# Remap the argument path id here to ensure the path bond is | ||
# able to find the source rvar when joining. | ||
prefix_path_id = pathctx.map_path_id( | ||
prefix_path_id, source_rvar.query.view_path_id_map | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't feel like it /should/ be necessary; it seems like it should be fixable on the edgeql->IR side.
Think about it a little more, and then update the comment to say it is a HACK we should revisit if it doesn't seem easy to fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most straightforward solution here was to just update the paths of all pointers in the edgeql->ir side instead of trying to rely on the path mapping.
edb/pgsql/compiler/relgen.py
Outdated
with newctx.subrel() as arg_ctx: | ||
args = _compile_call_args(ir_set, ctx=arg_ctx) | ||
arg_rvar = relctx.rvar_for_rel(arg_ctx.rel, ctx=ctx) | ||
for ir_arg in expr.args.values(): | ||
arg_path_id = ir_arg.expr.path_id | ||
relctx.include_rvar( | ||
newctx.rel, | ||
arg_rvar, | ||
arg_path_id, | ||
ctx=newctx, | ||
) | ||
if arg_scope_stmt := relctx.maybe_get_scope_stmt( | ||
arg_path_id, ctx=newctx | ||
): | ||
# The rvar is joined to newctx.rel, but other sets may | ||
# look for it in the scope statement. Make sure it's | ||
# available. | ||
pathctx.put_path_value_rvar( | ||
arg_scope_stmt, | ||
arg_path_id, | ||
arg_rvar, | ||
) | ||
|
||
if expr.inline_arg_path_ids: | ||
for param_path_id, arg_path_id in ( | ||
expr.inline_arg_path_ids.items() | ||
): | ||
pathctx.put_path_id_map( | ||
arg_ctx.rel, | ||
param_path_id, | ||
arg_path_id | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably wants to be its own function, because otherwise it drowns out the common case of just calling _compile_call_args
edb/schema/functions.py
Outdated
# Infer cardinality, multiplicity, and volatility | ||
inf_ctx = inference.make_ctx(env=ctx.env) | ||
inference.infer_cardinality( | ||
ir_set, scope_tree=ctx.path_scope, ctx=inf_ctx | ||
) | ||
inference.infer_multiplicity( | ||
ir_set, scope_tree=ctx.path_scope, ctx=inf_ctx | ||
) | ||
inference.infer_volatility(ir_set, env=ctx.env) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shouldn't need to do this? It'll all get done on the inlined body, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I think the reason why it doesn't work without inferring them here is because not enough of ctx/env are being shared
edb/edgeql/compiler/context.py
Outdated
set_types: Optional[Dict[irast.Set, s_types.Type]] = None, | ||
scope_tree_nodes: Optional[ | ||
MutableMapping[int, irast.ScopeTreeNode] | ||
] = None, | ||
type_rewrites: Optional[dict[ | ||
Tuple[s_types.Type, bool], | ||
irast.Set | None | Literal[True] | ||
]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is missing a lot of fields that need to be preserved in Environment for correct operation in some case or another, especially once DML is supported.
Definitely needed:
- materialized_sets
- dml_exprs
- dml_stmts
- dml_rewrites
- schema_refs
I think possibly needed, or needed if things are done a little differently:
- pointer_derivation_map (for card inference?)
- pointer_specified_info (for card inference?)
- query_globals
Plus, I think having all of the caches shared will help get more cache hits, even if it isn't necessary.
And I think many of the other things wouldn't hurt to have be shared.
So I think instead of constructing a fully new env, we should probably make a straight copy of the env and just update the few fields we care about changing.
It might /just/ be options
we need to change, though? In which case it might actually just be best to update the options
field and then restore it after.
ctx = stmtctx.init_context( | ||
schema=schema, | ||
options=get_compiler_options( | ||
schema, | ||
context, | ||
func_name=func_name, | ||
params=params, | ||
track_schema_ref_exprs=track_schema_ref_exprs, | ||
inlining_context=inlining_context, | ||
), | ||
inlining_context=inlining_context, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to go through init_context
here? I think we probably should be trying to share the context mostly. (I think it probably would work if we create a detached()
context?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to see if I could get a detached context working but I think the inlined context needs to have more than what is available in a detached context.
b85e67a
to
97908c6
Compare
3e5230d
to
45dbd62
Compare
async def test_edgeql_functions_inline_object_04(self): | ||
await self.con.execute(''' | ||
create type Bar { | ||
create required property a -> int64; | ||
}; | ||
insert Bar{a := 1}; | ||
insert Bar{a := 2}; | ||
insert Bar{a := 3}; | ||
create function foo(x: Bar) -> int64 { | ||
set is_inlined := true; | ||
using (x.a); | ||
}; | ||
''') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do a test like
create function foo() -> int64 {
set is_inlined := true;
using (count(Bar));
};
and then query
select (Bar, foo())
and make sure that we get 3
back as the result from foo()
.
I want to make sure that paths in the inlined functions don't end up getting captured by factoring with the enclosing query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good, though I'm worried about the path factoring issue I asked for a test for, which we'll need to talk about
edb/edgeql/compiler/__init__.py
Outdated
@@ -201,6 +202,7 @@ def compile_ast_to_ir( | |||
*, | |||
script_info: Optional[irast.ScriptInfo] = None, | |||
options: Optional[CompilerOptions] = None, | |||
inlining_context: Optional[context.ContextLevel] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't use these entrypoints anymore, right? Can we drop the inlining_context
from them?
Great work! |
8af3eb0
to
d19da8a
Compare
Allows for the ir ast trees to be directly embedded into the tree of a calling expression.
This is accomplished by:
compile_function_inline
to do only the necessary steps, excluding things materializing sets.irast.Parameter
nodes from the function with a newirast.InlinedParameterExpr
irast.Parameter
is used for runtime parameters when compiling queries.related #7692
Note: The volatility check for inlined function bodies was removed. This is because functions which use globals are
immutable
, since they expect a globals json to be passed in. This is not true of their inlined bodies, which becomestable
.