Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposing rank deficient info #21

Open
gragusa opened this issue Oct 24, 2022 · 1 comment
Open

Exposing rank deficient info #21

gragusa opened this issue Oct 24, 2022 · 1 comment

Comments

@gragusa
Copy link

gragusa commented Oct 24, 2022

I want to discuss the best way to have an API for RegressionModels that are rank deficient. In particular, I would like to have a mechanism allowing packages interfacing through StatsAPI to recover the collinear columns of the model matrix.

Consider this example (I am going to use GLM.jl):

X1 = randn(100)
X2 = randn(100)
y = X1 + X2 + randn(100)
df = DataFrame(y=y,X1=X1,X2=X2, X3=X1, X4=X1.+X2)
lm_1 = lm(@formula(y~X1+X2+X3+X4), df)

Using the StatsAPI, there is no way to recover the three columns of the model matrix that are linearly independent. However, one can do this by looking at the DensePredChol (and eventually at the DensePredQR).

For instance, if I want to calculate X'X using only these columns, I could do

X = modelmatrix(lm_1)
XX = X'X
ch = lm_1.model.pp.chol
rnk = rank(ch)
XX[1:end .∉ [ch.p[rnk+1:end]], 1:end .∉ [ch[rnk+1:end]]]

and similarly, for the columns of the modelmatrix:

Z = X[:, 1:end .∉ [ch.p[rnk+1:end]]] 

I don't know the best way to expose this at the API level. Maybe something like this?

function StatsAPI.collinearindexes(r::RegressionModel) end

The implementation in GLM could then look something like this:

function GLM.collinearindexes(m::LinPredModel) ## of course only for `CholeskyPivoted` (and `QRPivoted` when implemented)
  ch = m.pp.chol
  rnk = rank(ch)
  return ch.p[1:rnk], ch.p[rnk+1:end]
end

Probably not great. Ideas?

P.S.: This is very useful to make packages like CovarianceMatrices play nice with rank-deficient RegressionModels.

@gragusa
Copy link
Author

gragusa commented Dec 20, 2022

Any comments? Would anybody object to this implementation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant