Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plotting improvements #103

Merged
merged 13 commits into from
Jun 14, 2016
Merged

Plotting improvements #103

merged 13 commits into from
Jun 14, 2016

Conversation

dwysocki
Copy link
Member

Warning: incomplete, do not merge yet

This introduces the changes requested in #102.

There is a lot of coupling when it comes to phased/unphased data, so I
had to make a lot of changes in this one commit.

The main point of this commit was to deal with temporal-space data
instead of phase-space data wherever possible. This was especially
important in the `Fourier` class, to prepare for multiperiodic signal
support. Now the period needs to be provided to `Fourier` before
`Fourier.fit` or `Fourier.transform` is called. To allow for setting the
`period` parameter properly when `Fourier` is nested within another
estimator, I had to make `Fourier` a subclass of scikit-learn's
`BaseEstimator`, which we should have done long ago.

Most of the changes in `get_lightcurve` deal with replacing `phase` with
`time`. There are also some changes which deal with providing the
correct data when dealing with masked arrays. There is also a new piece
of code which passes the period into the `Fourier` object, which
required a little bit of hackery (see the comments before the call to
`predictor.set_params`).

The `rephase` function in `plotypus.periodogram` was also modified, such
that masked values are still rephased.
Instead of multiplying each index `i` by `2 * pi`, and then dividing
every element of the matrix by `period`, I instead define `omega = 2 *
pi / period` (as we usually do), and multiply each index `i` by that.
Added a new command line option: `--regressor-options key1 val1 ... keyN valN`. Allows user to
pass arbitrary arguments. Old regressor options now ignored (such as `--max-iter`) and will be removed soon.
Need to at least pass `fit_intercept False` to retain previous functionality, which will
change soon.
All CLI regressor options are now encompassed by the
`--regressor-options` option. The old options have been removed
completely.

Also, a help string has been added to `--regressor-options`.
Now the Fourier design matrix no longer provides the intercept column,
and relies on the regressor object to fit the intercept separately. This
is already the default behavior of all sklearn regressors (that I'm
aware of), and so this simply saves the user from needing to provide
`fit_intercept=False` every time.
Since it takes a series of key-value pairs, I made the metavar "KEY
VALUE". This makes it look like there are two metavars (desired) when
it's really just one with a space inbetween.
Instead of having the `--regressor` option take one of a few hand-chosen
values, it now takes the name of _any arbitrary regressor object_, such
as the default, `--regressor sklearn.linear_model.LassoLarsIC`. This
requires a little more user-knowledge, but opens a world of
possibilities.

This is accomplished through the new `plotypus.utils.import_name`
function, which imports an object from its full name and returns it,
without affecting the enclosing namespace.
A number of changes have been made relating to phasing and plotting, in
preparation for multiple periods.

Before, the `lightcurve.get_lightcurve` function returned an array
`phased_data`, which contained the original data, with outliers masked,
and with time replaced with phase. Now it has been replaced with the
array `data`, which only differs from the input by masking outliers.

The function also returned fit values with phase [0, 1). Now it instead
returns fit values over the entire time interval, and leaves phasing for
later. This does result in unnecessary computation for singly-periodic
stars, but it will be necessary for multi-periodic stars, and avoiding
the extra computation in the special case would require adding
complexity to the code. This array has been renamed from `lightcurve` to
`data_smooth`, to emphasize its relation to `data`.

Now, instead of plotting the data and fit repeated over 2 periods, we
phase the data such that it repeats after 2 periods instead of one. This
removes the need to duplicate the arrays, causes the observed data
points on each half to be different, and generalizes to multiple
periods, where the data _should not_ repeat after 1 period. This also
allows us to make the number of cycles before repetition _arbitrary_,
and thus a new CLI option `--cycles=2.0` has been added. `cycles` has
also been made a keyword argument to all functions which involve
rephasing, as well as the `plot_lightcurve` functions.

The option to plot _unphased_ data has also been created, although it is
currently inaccessible from the CLI. Simply plot data with
`period=None`, and it will be left unphased. This is rather ugly when
outputting to a small image, and will look much better if we eventually
add support for a format which supports interactive scrolling, or simply
use a matplotlibrc with a _very_ wide figure size.

The unused `--phase-points` CLI option has been removed.

A missing comma was added to the list given by the `__all__` attribute
of `plotypus.utils`.

Some doc strings are out of sync with the changes, though they were out
of sync before the changes as well.
dwysocki added 3 commits June 14, 2016 18:41
Previously we were outputting the fit over the duration of the first
period, which meant leaving out a number of phase points. Now we instead
output the second period, which is problematic in itself, but is at
least an improvement in most real scenarios. Ideally we should just be
outputting phase points which are not sensitive to any period.
We replaced the `--periods` option with a general `--parameters` option,
long ago, but never updated the user guide. This fixes the Beginner's
Guide, but still leaves other pages to be updated.
This guide is out of date, and needs to be updated. I have added a
warning for now.
@dwysocki dwysocki merged commit 6317831 into develop Jun 14, 2016
@dwysocki dwysocki deleted the plotting-improvements branch June 14, 2016 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant