User facing API for specifying linear models terms #731

lorentzenchr · 2023-11-07T11:52:58Z

I've seen that glum version 3 will get a formula interface, much like R glm, using formulaic. This is a great step for more usability.

I wanted to ask for the appetite of yet another way to specify models based on the following requirements:

Highlevel interface much like Wilkinson formulae
No scikit-learn pipeline needed.
(Some) Autocompletion support / Programmatic approach
(formulaic uses a string, so no autocomplete)
Context free
(formulaic saves the current scope / context)
Specify penalties
It would be nice to be able to specify penalties per term, e.g. L2-difference for a B-spline, L2 for a categorical feature, or a group L2 or group L1 for another categorical feature. Sophisticated: geo-penalty

MatthiasSchmidtblaicherQC · 2024-02-12T21:36:13Z

Thanks! I am also excited for the formulaic-based formula interface to be released in v3 as a tool for fast exploratory model building.

In my opinion, there is still a lot of room for development within the formulaic-based framework. One can add stateful transforms and modify the tabmat materializer and there is also the possibility to add features to formulaic itself. Therefore, I would first try out and optimize the formulaic based framework for some time and later assess if a third way of specifying models is warranted.

As to your points:

Context free

The context can already be turned off by passing an empty dict. We could make this more explicit, e.g., allowing to set context=False, at the cost of moving away from formulaic's conventions.

Specify penalties

I think that this could be quite interesting. A related feature is [smoothness penalties for splines ] (#471 (comment)). Again, this could be incorporated within the formulaic-based framework. If one wanted to, e.g., be able to specify a penalized spline as something like bs(x, df=4, degree=3, cyclic_penalty=10), then one could write a stateful transform for that penalized spline and adjust the TabmatMaterializer to return a penalty matrix that corresponds to the desired penalty.

Autocompletion support

I agree that this would probably require a different approach.

I would be curious to know though if you have a specific formula library in mind or if you are suggesting developing one from the ground up.

lbittarello · 2024-04-03T14:32:02Z

Coming as part of Glum 3.

lbittarello closed this as completed Apr 3, 2024

MatthiasSchmidtblaicherQC mentioned this issue Apr 12, 2024

glum v3.0 #677

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

User facing API for specifying linear models terms #731

User facing API for specifying linear models terms #731

lorentzenchr commented Nov 7, 2023

MatthiasSchmidtblaicherQC commented Feb 12, 2024 •

edited

Loading

lbittarello commented Apr 3, 2024

User facing API for specifying linear models terms #731

User facing API for specifying linear models terms #731

Comments

lorentzenchr commented Nov 7, 2023

MatthiasSchmidtblaicherQC commented Feb 12, 2024 • edited Loading

lbittarello commented Apr 3, 2024

MatthiasSchmidtblaicherQC commented Feb 12, 2024 •

edited

Loading