-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Medium] Fast solvers for situations with many columns. #133
Comments
Out of the solvers we currently have available, I think the only ones that don't construct the full Hessian would be L-BFGS and CD with diag_fisher (but let me know if I'm wrong). So I think diag_fisher is valuable as it's the only way we can reliably use both an L1 penalty and wide data. I don't fully understand the value of the off-diagonal elements of the Hessian for CD when it's only updating one variable at a time. Why does performance degrade if we only use the diagonal? |
The current gradient is updated at each step of CD. So, if you only have the diagonal, the gradient update is incorrect for every variable besides the current one. |
Notes from meeting:
|
Some potential steps:
|
An interesting idea here would be to implement an L-BFGS + CD solver. At each iteration, the new gradient information would be used to approximate the Hessian using a low rank approximation. Then, that low rank approximation could be used inside the CD solver instead of either the full data matrix or the true hessian. |
More good sources on "hessian-free" second order optimization methods. |
@tbenthompson and I were just talking about another alternative which could work. Essentially, the idea would be to create a third, outermost loop in which one samples a tractable column minibatch (say, 500 columns), and then runs the solver for several iterations. In particular, this method might work well with @tbenthompson's new approximate Hessian code, since one could probably be quite aggressive with enforcing sparsity in the delta-D vector, making subsequent iterations of the middle loop very inexpensive due to the now-cheap Hessian update. It also has the advantage of being a relatively easy modification to the existing solver, so it wouldn't be hard to try and benchmark future methods against. |
Currently, the solvers all build a Fisher matrix/Hessian. If the data matrix is
n x m
, this Fisher matrix will have shapem x m
. Ifm
is less than a few thousand, this is fine. However, ifm
gets very large, possibly due to lots of categorical variables, the current solvers will be inadequate because they are constructing a very large dense matrix.Currently, we can avoid that matrix construction using the
diag_fisher=True
option. However, this is not particularly well supported and performance might not be very good. We should consider adding a benchmark with more than 10,000 columns to better track performance in these cases.As part of this effort, it would be nice to separate the diag_fisher behavior and simply use a different solver function for high dimensional problems.
The text was updated successfully, but these errors were encountered: