-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change the way that we carry data in the AST #19
Conversation
Thinking about the ordering of passes. Not all of these passes exist yet; I'll make the ones that don't. Putting this here, because I think it will be useful to have this picture in mind when thinking about how exactly to pass information around between passes. PassesParsing (exists)We start with the raw text of a python program (either from a function annotator using During this process, each node is given a unique ID. A-Normalization (exists)This pass converts our program into A-normal form. A-normal form is SSA-lite; every sub computation is named, but it doesn't do any renaming of existing variables. So if there are two writes to the same variable in a program, we don't change those. Forward type annotation (doesn't exist)Mark all the types that we can glean from doing a forward pass. Concretely, we can mark the types of literals, variables that refer to objects or imports. Maybe there are other things in this list? There's a question of what we should mark things that we don't know the types of. Maybe it should be a fresh type variable. Make type calls explicit (exists)We should know enough about the programs to insert explicit type applications. If we have this type, pd.Series: forall T . List[Refine[T]] -> pd.Series[T]{..} and this program: def fut():
s = pd.Series([1, 2, 3])
return s / -1 we can make the type applications explicit: def fut():
s = pd.Series[int]([1, 2, 3])
return s / -1 If we can't figure out Hindley-Milner (backward phase of type inference)For the types that we can't figure out during the forward pass, generate constraints from how a variable is used, then use unification to solve these constraints. Liquid TemplateGenerate fresh liquid variables for every type in the program. Liquid Constraint GenerationGenerate liquid constraints by walking the program. Liquid Predicate AbstractionSolve for the liquid type constraints by using predicate abstraction. |
How should we handle self parameters.Consider this following example: # in context
s: pd.Series[int] = pd.Series([1, 2, 3])
val = s.max() The Option 1: Rewrite the programThe first option, is that we can rewrite the program to make the val = s.max()
# becomes
val = pd.Series.max(s) This is nice because then we don't have to have special logic to handle the self case whenever we need to handle function calls. However, I think that we would have to implement this during forward type annotation. This pass it starting to feel like it's doing too many things. Maybe that's ok for now. Option 2: Special HandlingAnother option is just to check if the first parameter is named self and has a Conclusion (for now)I think that for now, I'm going to implement option 1. It seems like it will be simpler. |
Ended up going with Option 2. Option 1 was hard because I couldn't implement the transformation in |
annotation is probably the wrong place to do it)
I wanted to put a derivation explaining how
def __mul__(self: pd.Series[int], other: int) -> pd.Series[{int | V == self * other}]:
... However, the return type here is not well typed in our predicate language. We don't have a notion of multiplication between a series and an int. We could hard-code these semantics, but this is just hard-coding the logic into the SMT translation. Instead, I think that it is simpler to have a "canonical" element function. def el(self: pd.Series[{int | K}]) -> {int | K}:
... The implementation of this function doesn't matter for our purposes. It would be reasonable to just hard-code this function I think. Here's a derivation of how using this function would work during liquid inference. # we are trying to find some K that generalized V == el(a) * b
pd.Series[{int | V == el(a) * b}] <: pd.Series[{int | K}]
# we have the following things in our context
a: pd.Series[{int | V > 0}]
b: {int | V < 0}
|- {int | V == el(a) * b} <: {int | K}
# the use of el(a) effectively introduces a new element into the context
# we have the following things in our context
a: pd.Series[{int | V > 0}]
el(a): {int | V > 0}
b: {int | V < 0}
|- {int | V == el(a) * b} <: {int | K}
# from the normal liquid inference algo, we subst to get concrete assumptions
a: pd.Series[{int | V > 0}]
{int | el(a) > 0}
{int | b < 0}
|- {int | V == el(a) * b} <: {int | K}
# now we can perform predicate abstraction
el(a) > 0 && b < 0 && V == el(a) * b ==> V < 0
...
# K := V < 0 is a valid solution |
Thinking about how we want to templatize For this running series example, we generate this program after the type annotation + inference phase. def fut() -> pd.Series[int]:
tmp0: List[int] = [1, 2, 3]
s: pd.Series[int] = pd.Series[int](tmp0)
tmp1: int = -1
return s * tmp1 We want to generate this program: def fut() -> pd.Series[Refine[int, "K2"]]:
tmp0: py.List[Refine[int, "K0"]] = [1, 2, 3]
s: pd.Series[Refine[int, "K1"]] = pd.Series[Refine[int, "K1"]](tmp0)
tmp1: Refine[int, V == -1] = -(1)
return s * tmp1 with the following set of constraints: V == 1 <: K0
V == 2 <: K0
V == 3 <: K0
K0 <: K1
pd.Series[int, V == el(s) * tmp1] <: pd.Series[int, K2] This seems straightforward enough. Let's think about using def fut() -> pd.Series[int]:
tmp0: List[int] = [1, 2, 3]
s: pd.Series[int] = pd.Series[int](tmp0)
tmp1: int = s.max()
tmp2: int = -1
return tmp1 * tmp2 We should templatize that program into: def fut() -> pd.Series[Refine[int, K2]]:
tmp0: List[Refine[int, K0]] = [1, 2, 3]
s: pd.Series[Refine[int, K1]] = pd.Series[Refine[int, K1]](tmp0)
tmp1: Refine[int, K1] = s.max()
tmp2: Refine[int, V == -1] = -1
return tmp1 * tmp2 Constraints: V == 1 <: K0
V == 2 <: K0
V == 3 <: K0
K0 <: K1
V == tmp1 * tmp2 <: K2 The interesting bit of that program is that the refinement for |
Thinking through how to templatize functions / object constructors.We have the type s = pd.Series([1, 2, 3]) because we are passing in a List[{int | K0}] -> pd.Series[{int | K1}] This could be correct if we also generate the constraint def __init__(self, l: List[int]):
self.data = [1, 2] The My initial instinct was that we should always templatize |
Can I ask what is probably a stupid question about the first example you gave? In the case where If so, would something like a |
If I understand your question correctly, I think the answer is yes. Though I'm not really sure what you are saying about path constraints. Suppose you know ahead of time (because you annotated or something), that the type of the constructor is Then the question is does the following type check: def __init__(self, l: List[{int | K}]):
self.data = [1, 2] Python constructors aren't allowed to return anything. So we need some way of tying the types of input parameters to object fields. Maybe something like this: class pd.Series(Generic[K]):
data: List[{int | K}]
def __init__(self, l: List[{int | K}]):
self.data = l Here, the class is the thing abstracting over refinement variables. So when you instantiate it with some list like so: Ok, then back to the example where we ignore the input list And finally, about your thought of having some kind of class pd.Series(Generic[K0, K1]) where K0 <: K1:
data: List[{int | K1}]
def __init__(self, l: List[{int | K0}]):
self.data = l This was rather rambly, but I hope this maybe answered your question? |
Finally have this in a state that I think is mergable. Tests are working, and there are minimal hacks. By mainlining this now, I think it'll be more possible to work in parallel and in smaller chunks. Also, this is the most poorly named issue+PR+branch in existence. This started out as a simple move away from mutating state in our internal AST and ended up as close to a complete reworking of most of the core code. |
State is carreid in the AST as mutable fields. This introduces a class of bugs where state is not properly initialized in places where it is used.
I propose that we give each AST node a unique id at creation time. Then, we can carry state in a separate Map data structure that maps
id -> state
. This has several benefits.id -> state
instead ofid -> state | None
.Infer
that annotates every node in the AST with a type. This pass will produceMap[Id, Type]
. If some later passP
requires this information, we can require that in order to construct a visitor implementing passP
, we have to pass in a map ofMap[Id, Type]
.We will have to have a mechanism for mutating / creating new nodes. What ids should these new nodes use? Sometimes they probably want to inherit the id of the node that they are replacing. If we are adding a new node, we may need to mint a new Id.