-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Enhancement Proposal: Wilkinson Formulas and New Model Interfaces #101
Comments
As an update to this, I've been cleaning up the formula parser and writing tests for it. I expect to have it fully prepared for review in the next few weeks. Among other things, I've written the code to dispatch to combo models and made it so the default behavior of Regarding the API changes, I wanted to elaborate more on some of the issues I've been encountering.
The designs I'm working on in the
The |
Hi Tyler, I've finally managed to go over your fork and see part of what you are working on. As mentioned on #95 , this topic has been under discussion within There are several complications to the application of formulas within our context. For example, the use of spatial lags for both y and X, and how to differentiate them, models that use spatial regimes and how to define what the regimes apply to or not, the specification of different equations for seemingly unrelated regressions, the combination of both regimes and SUR, and the list goes on getting more and more complicate. Add to all this the fact that spreg has a standalone GUI called GeoDaSpace. I look forward to seeing what you will propose. However, a no-go for spreg is anything that is not backwards-compatible. Any additional feature must be able to conserve what is already there, so it doesn't break all that is built on top of Additionally, I understand the motivation when you say, for example, that using mathematical symbols makes code harder to follow. However, in fact, they make it easier for anyone familiar with the literature. One example: if you were to state GM_Lag.residuals, I would have difficulties understanding precisely what you are referring to since spatial lag (and error) models have two different kinds of residuals. So please do keep these things in mind. IMO, the code should be easier to read for those familiar with spatial econometrics literature rather than computer scientists. Please let me know if you need any support from my side. |
Pursuant to the discussion in #95, I've implemented a version of Wilkinson formulas for spatial lag and spatial error models. The code is available on the
formula
branch of my fork ofspreg
.spreg.from_formula()
parses a Wilkinson formula and returns a fitted OLS, spatial lag, or spatial error model depending on the user's inputs. The function signature is:spreg.from_formula(formula, df, w=None, method="gm", debug=False, **kwargs)
where:
formula
is a string followingformulaic
's syntax and the below additional syntaxdf
is aDataFrame
orGeoDataFrame
w
is alibpysal.weights.W
objectmethod
is a string describing the estimation method (for spatial lag or error models)debug
is a boolean which (when true) outputs the parsed formula alongside the configured model**kwargs
are additional arguments to be passed to the model classThe new formula syntax comes in two parts:
<...>
operator:<
and>
are not reserved characters informulaic
, so there are no conflicts.<FIELD>
adds spatial lag of that field from the dataframe to model matrix.<...>
, e.g.<{10*FIELD1} + FIELD2>
will be correctly parsed.&
symbol:&
is not a reserved character informulaic
, so there are no conflicts.<...>
and&
:<FIELD1 + ... + FIELDN> + &
is the most general possible spatial model available. However, the dispatcher does not currently dispatch to the combo model classes (future TODO).Importantly, all terms and operators MUST be space delimited in order for the parser to properly pick up on the tokens. The current design also requires the user to have constructed a weights matrix first, which I think makes sense as the weights functionality is well-documented and external to the actual running of the model.
While implementing this, I ran into stumbling blocks in other parts of the
spreg
API that have led me to work on a redesign of the basic modeling classes and their dependencies. These changes can be found in theapi-dev
branch of my fork ofspreg
, where I've been streamlining the user-oriented API to OLS (seeprop_ols.py
), spatial lag (prop_lag.py
), and spatial error (prop_err.py
) models. These new interfaces are works in progress and will be described further in a future Feature Enhancement Proposal. However, the new interfaces I've created are not backwards-compatible with currentspreg
code. Looking ahead, would it make sense to focus on designing a new package with updated spatial regression interfaces, or to create parallel interfaces inspreg
to the same estimation code?The text was updated successfully, but these errors were encountered: