Skip to content

PoC: Generics and better type inference #58

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

ysthakur
Copy link
Member

@ysthakur ysthakur commented May 10, 2025

This is a proof-of-concept implementing generics and slightly better type inference. It's based on the paper Complete and Easy Bidirectional Typechecking for Higher-Rank Polymorphism, although I had to make some modifications because their type system is pretty different from ours. I'm not sure yet if the new type inference algorithm will never infer the wrong type, and there's missing pieces here and there. But we should be able to make this general approach to type checking/inference work.

This is what it looks like:

def f<T> [ x: T ] : nothing -> list<T> {
  let z = $x
  [$z]
}
f 123 # Inferred to be of type list<int>

I didn't do subtype/supertype bounds on type parameters, but it shouldn't be terribly hard to implement with the current design.

Other things included in this PR:

  • A top type (supertype of all types), a bottom type (subtype of all types), and intersection types (allof). These are only created internally by the typechecker. Users can't write them themselves, but that's a completely artificial restriction; I just didn't feel like working on the parser again.
  • Better consideration for any and oneof
  • The types of custom command parameters are used when typechecking

Things worsened by this PR:

  • ++ requires both operands to be lists, to match how Nushell now works
  • Type mismatches for binary operators now have less descriptive error messages
  • Repetitive error messages. If you typecheck let x: list<list<int>> = [["foo"]], you'll get 3 error messages instead of just one (one saying "foo" should be an int, one for ["foo"], and one for [["foo"]]).

Things left to do:

  • When getting rid of type variables at the end of typechecking, make sure that the lower bound is actually a subtype of the upper bound

Apologies for the big PR. I should probably have done some of these things before implementing generics. That said, it's just a proof-of-concept, so it doesn't need to be reviewed thoroughly.

How type checking/inference work

The typecheck_expr method takes the expected type of the expression it is currently processing and provides an inferred type for every expression it visits. This way, we both check against the expected type and infer a type for the expression at the same time.

For generics to work, the algorithm requires creating and solving type variables. These type variables have a lower bound and an upper bound. As we move through the program, these bounds are tightened further. At the end of the program, the lower bound of each type variable is chosen as its value.

Every time we come across a call to a custom command with type parameters, we instantiate new type variables corresponding to those type parameters. For example, for the expression f 123 above, we instantiate a new type variable '0 with lower bound bottom and upper bound top (essentially unbounded). Because of the signature of f, we know that 123: '0 and f 123: list<'0>.

So we first call typecheck_expr to check/infer the type of 123, providing '0 as its expected type. Since it's just an integer literal, we infer the type to be int. Then, to ensure that this type matches the expected type ('0), we call the constrain_subtype method to ensure that int <: '0. The existing lower bound for '0 was bottom, so we set the new lower bound to oneof<bottom, int> = int.

Then we set the type of f 123 to list<'0>. After all expressions have been processed, we replace all occurrences of '0 with int. So this becomes list<int>.

todo write about how recursive bounds are avoided

@ysthakur ysthakur changed the title PoC: Generics PoC: Generics and better type inference May 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant