Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add lombscargle based surrogate for irregular ts #67

Merged
merged 8 commits into from
Jan 16, 2022

Conversation

felixcremer
Copy link
Contributor

This adds a new method to construct surrogates for irregular ts based on
the lombscargle periodogram as discussed in #12

This misses at the moment tests, docs and some low hanging speed ups.
I am going to add docstrings and try to make it faster tomorrow.

I tried it visually and this seems to give reasonable surrogates, but unfortunately, it is really slow at the moment.
I am sure, that we can get a little bit faster by reusing some memory.
The commented code was an attempt to fully implement the algorithm as described in
Schmitz and Schreiber. But this destroys the convergence of the algorithm.
This also doesn't yet implement the smarter drawing strategies described in the paper.

This adds a new method to construct surrogates for irregular ts based on
the lombscargle periodogram which is a alternative to the fourier
transform to get the periodogram.
This misses at the moment tests, docs and some low hanging speed ups.
Copy link
Member

@kahaaga kahaaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added few comments not related to the functionality of the code. These should be addressed before the PR is made ready for merging.

src/methods/lombscargle.jl Outdated Show resolved Hide resolved
src/api.jl Outdated Show resolved Hide resolved
@kahaaga
Copy link
Member

kahaaga commented May 28, 2020

I tried it visually and this seems to give reasonable surrogates,

Could you add an example in the docs page with the example you tried? I'm curious to see how it looks!

... but unfortunately, it is really slow at the moment.

It is known that the simulated annealing process is slow, so this is no surprise. No problem! Try to speed up as best you can, and we can see if there are any more obvious improvements that can be made after that.

src/api.jl Outdated Show resolved Hide resolved
This deletes the surrogate function for signal and time axis and puts
the time axis into the LS type. We are also following the surrogenerator
api with LS. The arguments to the LS method are now incorporated into
the method struct.
We are reusing the lombscargle plan in the simulated annealing
algorithm.
This works, because we are only shuffling the data around and we can
instead shuffle the time vector and get the same result.
This is the same trick as in the LombScargle.bootstrap function.
At the end we use the permutation of the time vector to permute the
input signal to get a surrogate vector.
@felixcremer felixcremer changed the title WIP: Add lombscargle based surrogate for irregular ts RFC: Add lombscargle based surrogate for irregular ts Jun 3, 2020
if the minkowski distance of order `q` between the power spectrum of the surrogate data and the original data is less than before.
The iteration procedure ends when the relative deviation between the periodograms is less than `tol` or when `N_total` number of tries or `N_acc` number of actual swaps is reached.

It is similar to the [`IAAFT`](@ref) method for regular time series.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I'd be specific and say "For time series with an equidistant time steps, surrogates generated by the simulated annealing process results in surrogates similar to those produced by the IAAFT method.

export LS
"""
LS(t; tol=1, N_total=10000, N_acc=2000,q=1)
Compute a surrogate of an irregular time series with supporting time steps `t` based on the simulated annealing algorithm described in [^SchreiberSchmitz1999].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be explicit that it is the time axis that is irregular. As stated now, it is not clear whether the data values or their (time) indices are irregular.

@@ -0,0 +1,25 @@
# Irregular Timeseries Surrogates
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalization only for the first word. I'd also say "Surrogates for time series with irregular time indices", or something along those lines, to be specific about what is irregular.

# Irregular Timeseries Surrogates


To derive a surrogate for irregular time series we can use surrogate methods which are irrespective of the time axis like [`RandomShuffle`](@ref) or [`BlockShuffle`](@ref)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The statement "... surrogate methods which are irrespective of the time axis" doesn't quite make sense. Maybe it would be better to say something like "To derivate a surrogate for time series which are irregularly sampled, we can use surrogate methods that does not explicitly require regularly sampled time series, for example RandomShuffle or BlockShuffle. Alternatively, we need to use algorithms that take the irregularity of the time axis into account.

To derive a surrogate for irregular time series we can use surrogate methods which are irrespective of the time axis like [`RandomShuffle`](@ref) or [`BlockShuffle`](@ref)
or we need to use algorithms, which take the irregularity of the time axis into account.

## LombScargle based surrogate
Copy link
Member

@kahaaga kahaaga Jun 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the literature, Lomb-Scargle is used instead of LombScargle. So say "Lomb-Scargle-based surrogate"


## LombScargle based surrogate

The LS surrogate is a form of a constrained surrogate which takes the LombScargle periodogram to derive surrogates with similar phase distribution.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lomb-Scargle again


## LombScargle based surrogate

The LS surrogate is a form of a constrained surrogate which takes the LombScargle periodogram to derive surrogates with similar phase distribution.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is incomplete. "to derive surrogates with similar phase distribution". ... similar to what? I'm guessing you mean similar phase distribution as the original time series!

@kahaaga
Copy link
Member

kahaaga commented Jun 3, 2020

@felixcremer Nice work! I have a few comments regarding the wording in the documentation that need to be fixed.

Memory allocation and runtime also seems excessive. 243.765771 seconds and 192 Gb memory allocated to generate a surrogate for a time series with 1000 observations seems excessive. Did you run the algorithm on a short time series to trigger compilation, or does this timing include compilation? You can hide such as initial step in Documenter using a @setupblock.

If it is really this slow, I can also have a look and see if there are any tricks to maybe avoid copying so much.

@felixcremer
Copy link
Contributor Author

I would be happy to get a round of comments from you.

I am not sure, whether the docs should be about only this lombscargle or whether we would like to include a part about surrogates for irregular time series in general.

When I test this surrogate with the cosine with noise as in the tests, I get often the original data with slightly shifted noises and it looks as if I would get the original data, when I would let the simulated annealing run further. Where I would rather expect to get a shifted periodic signal compared to my original data.
Is this something that is expected and is this also happening with the IAAFT method?

This is still missing the better selection algorithm for the swapped values based on the rank
and the other acceptance criteria based on temperature.
These two changes should "only" change the speed so I am not sure, whether this has to happen in this PR.

@kahaaga
Copy link
Member

kahaaga commented Jun 3, 2020

I am not sure, whether the docs should be about only this lombscargle or whether we would like to include a part about surrogates for irregular time series in general.

I think the doc page for the method should be titled "Simulated annealing surrogates". We don't move random shuffle surrogates to a separate page, so this surrogate method shouldn't either. It is sufficient to mention in the docstring for the method that it also works on irregularly sampled data.

When I test this surrogate with the cosine with noise as in the tests, I get often the original data with slightly shifted noises and it looks as if I would get the original data, when I would let the simulated annealing run further. Where I would rather expect to get a shifted periodic signal compared to my original data.
Is this something that is expected and is this also happening with the IAAFT method?

I will experiment a bit and read the original simulated annealing paper in more detail before I get back to this.

The IAAFT yield surrogate that not only phase-shift the data (actually, that would just be time-shifted surrogates), but completely randomizes the phases. That means that most of the time, you get rather different-looking time series (because oscillations of different magnitudes are shifted around). In rare cases, however, you could get time series that are practically identical to the original. The only requirement is the preservation (to some tolerance) of the power spectrum.

It may very well happen that for a single surrogate realization, the randomization happens in a manner such that the low frequencies are roughly preserved in the original data in the time domain, but shuffles the high frequencies / noise (as you describe). However, this only happens occasionally.

Is this behaviour consistent over multiple surrogate realizations?

I'm guessing that the simulated annealing process should also generate time series that are phase-randomized in the same manner as for the IAAFT method, but I'm not entirely sure I completely understand the procedure. I'll get back to you once I have studied the paper more closely.

This is still missing the better selection algorithm for the swapped values based on the rank
and the other acceptance criteria based on temperature.
These two changes should "only" change the speed so I am not sure, whether this has to happen in this PR.

These criteria don't necessarily need to be included at this stage. As long as the method docstring states which parts of the original implementations are included and which parts are not, we should be fine. But I'll have a look at that too when I study the paper --- maybe it's not that complicated to implement? I'll have a look!

@Datseris
Copy link
Member

Datseris commented Jun 3, 2020

@felixcremer thanks for the work! My review would be identical with Kristian's so I won't "spam" the same stuff! Unfortunately I don't have the spare time now to look at the papers so I don't feel I can help much further in the details, sorrY!

@felixcremer
Copy link
Contributor Author

felixcremer commented Jun 8, 2020

Memory allocation and runtime also seems excessive. 243.765771 seconds and 192 Gb memory allocated to generate a surrogate for a time series with 1000 observations seems excessive. Did you run the algorithm on a short time series to trigger compilation, or does this timing include compilation?

Yes this is after I have started it with a short time series.
These are the times for the cosine with noise for a noisy timeaxis for different number of time steps.

N = 10
0.010424 seconds (7.94 k allocations: 923.219 KiB)
N=100
2.516297 seconds (10.27 M allocations: 2.350 GiB, 7.79% gc time)
N=1000
237.660706 seconds (350.16 M allocations: 176.666 GiB, 3.19% gc time)

As we can see, the increase is nearly O(N^2). Which is also the expected behaviour descriped in the paper.

If it is really this slow, I can also have a look and see if there are any tricks to maybe avoid copying so much.

Yes please, but I have profiled it and most of the time is spent in the computation of the lombscargle periodogram. So we would either need to find a way to compute only the parts of the Lomb-Scargle periodogram which are changed by swapping two values or we would need to reduce the overall number of steps we need to get to compare periodograms.

@felixcremer
Copy link
Contributor Author

It may very well happen that for a single surrogate realization, the randomization happens in a manner such that the low frequencies are roughly preserved in the original data in the time domain, but shuffles the high frequencies / noise (as you describe). However, this only happens occasionally.

Is this behaviour consistent over multiple surrogate realizations?

This happens for enough realizations, that I recognized it and the resulting surrogates are looking like that:

similar_surrogate_ls

Here y1 is the original timeseries and y2 is the surrogate. I am not sure, whether this is expected behaviour and the cosine with noise is simply a bad example for such a type of surrogates.
I would have rather expected something like that:
onlysurrogate

@kahaaga
Copy link
Member

kahaaga commented Jan 15, 2022

@Datseris I'll have a final look at this tomorrow, and hopefully we can merge and finalize the JOSS paper after that.

@kahaaga
Copy link
Member

kahaaga commented Jan 16, 2022

@Datseris There are some changes to LombScargle.jl that leads to code not working in this PR.

I can't directly add changes to this branch without pushing a new PR to Felix's fork, which is cumbersome, so I'm merging this to master, then creating a new branch with my updates.

@kahaaga kahaaga marked this pull request as ready for review January 16, 2022 10:58
@kahaaga kahaaga merged commit da803ce into JuliaDynamics:master Jan 16, 2022
@Datseris
Copy link
Member

@felixcremer thank you for the PR. Please, when you do PRs in open source software, leave the button "allow commits by maintainers" checked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants