Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User facing API for specifying linear models terms #731

Closed
lorentzenchr opened this issue Nov 7, 2023 · 2 comments
Closed

User facing API for specifying linear models terms #731

lorentzenchr opened this issue Nov 7, 2023 · 2 comments

Comments

@lorentzenchr
Copy link
Contributor

I've seen that glum version 3 will get a formula interface, much like R glm, using formulaic. This is a great step for more usability.

I wanted to ask for the appetite of yet another way to specify models based on the following requirements:

  • Highlevel interface much like Wilkinson formulae
    No scikit-learn pipeline needed.
  • (Some) Autocompletion support / Programmatic approach
    (formulaic uses a string, so no autocomplete)
  • Context free
    (formulaic saves the current scope / context)
  • Specify penalties
    It would be nice to be able to specify penalties per term, e.g. L2-difference for a B-spline, L2 for a categorical feature, or a group L2 or group L1 for another categorical feature. Sophisticated: geo-penalty
@MatthiasSchmidtblaicherQC
Copy link
Contributor

MatthiasSchmidtblaicherQC commented Feb 12, 2024

Thanks! I am also excited for the formulaic-based formula interface to be released in v3 as a tool for fast exploratory model building.

In my opinion, there is still a lot of room for development within the formulaic-based framework. One can add stateful transforms and modify the tabmat materializer and there is also the possibility to add features to formulaic itself. Therefore, I would first try out and optimize the formulaic based framework for some time and later assess if a third way of specifying models is warranted.

As to your points:

Context free

The context can already be turned off by passing an empty dict. We could make this more explicit, e.g., allowing to set context=False, at the cost of moving away from formulaic's conventions.

Specify penalties

I think that this could be quite interesting. A related feature is [smoothness penalties for splines ] (#471 (comment)). Again, this could be incorporated within the formulaic-based framework. If one wanted to, e.g., be able to specify a penalized spline as something like bs(x, df=4, degree=3, cyclic_penalty=10), then one could write a stateful transform for that penalized spline and adjust the TabmatMaterializer to return a penalty matrix that corresponds to the desired penalty.

Autocompletion support

I agree that this would probably require a different approach.

I would be curious to know though if you have a specific formula library in mind or if you are suggesting developing one from the ground up.

@lbittarello
Copy link
Member

Coming as part of Glum 3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants