-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regressor generalization #101
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
There is a lot of coupling when it comes to phased/unphased data, so I had to make a lot of changes in this one commit. The main point of this commit was to deal with temporal-space data instead of phase-space data wherever possible. This was especially important in the `Fourier` class, to prepare for multiperiodic signal support. Now the period needs to be provided to `Fourier` before `Fourier.fit` or `Fourier.transform` is called. To allow for setting the `period` parameter properly when `Fourier` is nested within another estimator, I had to make `Fourier` a subclass of scikit-learn's `BaseEstimator`, which we should have done long ago. Most of the changes in `get_lightcurve` deal with replacing `phase` with `time`. There are also some changes which deal with providing the correct data when dealing with masked arrays. There is also a new piece of code which passes the period into the `Fourier` object, which required a little bit of hackery (see the comments before the call to `predictor.set_params`). The `rephase` function in `plotypus.periodogram` was also modified, such that masked values are still rephased.
Instead of multiplying each index `i` by `2 * pi`, and then dividing every element of the matrix by `period`, I instead define `omega = 2 * pi / period` (as we usually do), and multiply each index `i` by that.
Added a new command line option: `--regressor-options key1 val1 ... keyN valN`. Allows user to pass arbitrary arguments. Old regressor options now ignored (such as `--max-iter`) and will be removed soon. Need to at least pass `fit_intercept False` to retain previous functionality, which will change soon.
All CLI regressor options are now encompassed by the `--regressor-options` option. The old options have been removed completely. Also, a help string has been added to `--regressor-options`.
Now the Fourier design matrix no longer provides the intercept column, and relies on the regressor object to fit the intercept separately. This is already the default behavior of all sklearn regressors (that I'm aware of), and so this simply saves the user from needing to provide `fit_intercept=False` every time.
Since it takes a series of key-value pairs, I made the metavar "KEY VALUE". This makes it look like there are two metavars (desired) when it's really just one with a space inbetween.
Instead of having the `--regressor` option take one of a few hand-chosen values, it now takes the name of _any arbitrary regressor object_, such as the default, `--regressor sklearn.linear_model.LassoLarsIC`. This requires a little more user-knowledge, but opens a world of possibilities. This is accomplished through the new `plotypus.utils.import_name` function, which imports an object from its full name and returns it, without affecting the enclosing namespace.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This resolves issue #90, and the regressor-specific portion of #91.
It removes the hand-chosen regressor objects, such as
--regressor LassoCV
, and replaces them with the ability to specify any regressor. So the above call would need to be more verbose:--regressor sklearn.linear_model.LassoCV
, but in exchange for some verbosity, we gain complete generality. This is implemented withplotypus.utils.import_name
.This also removes the
--max-iter
and--regularization-cv
options, which were passed to specific regressors, and instead implements a catch-all--regressor-options KEY VALUE ... KEY VALUE
option. This allows the user to provide any option to their regressor, providedVALUE
is a simple Python literal (parsable by the standard library functionast.literal_eval
or a string. Value parsing is implemented withplotypus.utils.literal_eval_str
, and conversion to adict
is done withplotypus.utils.strlist_to_dict
.Because of these changes, we could no longer assume that
fit_intercept=False
on the regressors. In fact, it is most likely thatfit_intercept=True
, as that is the default for every scikit-learn regressor that I know of. For this reason, I have removed the one's column frompreprocessing.Fourier.design_matrix
, andget_lightcurve
now takes theA_0
term from the regressor'sintercept_
attribute.