-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Add lombscargle based surrogate for irregular ts #67
Conversation
This adds a new method to construct surrogates for irregular ts based on the lombscargle periodogram which is a alternative to the fourier transform to get the periodogram. This misses at the moment tests, docs and some low hanging speed ups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added few comments not related to the functionality of the code. These should be addressed before the PR is made ready for merging.
Could you add an example in the docs page with the example you tried? I'm curious to see how it looks!
It is known that the simulated annealing process is slow, so this is no surprise. No problem! Try to speed up as best you can, and we can see if there are any more obvious improvements that can be made after that. |
This deletes the surrogate function for signal and time axis and puts the time axis into the LS type. We are also following the surrogenerator api with LS. The arguments to the LS method are now incorporated into the method struct.
We are reusing the lombscargle plan in the simulated annealing algorithm. This works, because we are only shuffling the data around and we can instead shuffle the time vector and get the same result. This is the same trick as in the LombScargle.bootstrap function. At the end we use the permutation of the time vector to permute the input signal to get a surrogate vector.
src/methods/lombscargle.jl
Outdated
if the minkowski distance of order `q` between the power spectrum of the surrogate data and the original data is less than before. | ||
The iteration procedure ends when the relative deviation between the periodograms is less than `tol` or when `N_total` number of tries or `N_acc` number of actual swaps is reached. | ||
|
||
It is similar to the [`IAAFT`](@ref) method for regular time series. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, I'd be specific and say "For time series with an equidistant time steps, surrogates generated by the simulated annealing process results in surrogates similar to those produced by the IAAFT
method.
src/methods/lombscargle.jl
Outdated
export LS | ||
""" | ||
LS(t; tol=1, N_total=10000, N_acc=2000,q=1) | ||
Compute a surrogate of an irregular time series with supporting time steps `t` based on the simulated annealing algorithm described in [^SchreiberSchmitz1999]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be explicit that it is the time axis that is irregular. As stated now, it is not clear whether the data values or their (time) indices are irregular.
@@ -0,0 +1,25 @@ | |||
# Irregular Timeseries Surrogates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalization only for the first word. I'd also say "Surrogates for time series with irregular time indices", or something along those lines, to be specific about what is irregular.
# Irregular Timeseries Surrogates | ||
|
||
|
||
To derive a surrogate for irregular time series we can use surrogate methods which are irrespective of the time axis like [`RandomShuffle`](@ref) or [`BlockShuffle`](@ref) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The statement "... surrogate methods which are irrespective of the time axis" doesn't quite make sense. Maybe it would be better to say something like "To derivate a surrogate for time series which are irregularly sampled, we can use surrogate methods that does not explicitly require regularly sampled time series, for example RandomShuffle
or BlockShuffle
. Alternatively, we need to use algorithms that take the irregularity of the time axis into account.
To derive a surrogate for irregular time series we can use surrogate methods which are irrespective of the time axis like [`RandomShuffle`](@ref) or [`BlockShuffle`](@ref) | ||
or we need to use algorithms, which take the irregularity of the time axis into account. | ||
|
||
## LombScargle based surrogate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the literature, Lomb-Scargle is used instead of LombScargle. So say "Lomb-Scargle-based surrogate"
|
||
## LombScargle based surrogate | ||
|
||
The LS surrogate is a form of a constrained surrogate which takes the LombScargle periodogram to derive surrogates with similar phase distribution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lomb-Scargle again
|
||
## LombScargle based surrogate | ||
|
||
The LS surrogate is a form of a constrained surrogate which takes the LombScargle periodogram to derive surrogates with similar phase distribution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence is incomplete. "to derive surrogates with similar phase distribution". ... similar to what? I'm guessing you mean similar phase distribution as the original time series!
@felixcremer Nice work! I have a few comments regarding the wording in the documentation that need to be fixed. Memory allocation and runtime also seems excessive. If it is really this slow, I can also have a look and see if there are any tricks to maybe avoid copying so much. |
I would be happy to get a round of comments from you. I am not sure, whether the docs should be about only this lombscargle or whether we would like to include a part about surrogates for irregular time series in general. When I test this surrogate with the cosine with noise as in the tests, I get often the original data with slightly shifted noises and it looks as if I would get the original data, when I would let the simulated annealing run further. Where I would rather expect to get a shifted periodic signal compared to my original data. This is still missing the better selection algorithm for the swapped values based on the rank |
I think the doc page for the method should be titled "Simulated annealing surrogates". We don't move random shuffle surrogates to a separate page, so this surrogate method shouldn't either. It is sufficient to mention in the docstring for the method that it also works on irregularly sampled data.
I will experiment a bit and read the original simulated annealing paper in more detail before I get back to this. The IAAFT yield surrogate that not only phase-shift the data (actually, that would just be time-shifted surrogates), but completely randomizes the phases. That means that most of the time, you get rather different-looking time series (because oscillations of different magnitudes are shifted around). In rare cases, however, you could get time series that are practically identical to the original. The only requirement is the preservation (to some tolerance) of the power spectrum. It may very well happen that for a single surrogate realization, the randomization happens in a manner such that the low frequencies are roughly preserved in the original data in the time domain, but shuffles the high frequencies / noise (as you describe). However, this only happens occasionally. Is this behaviour consistent over multiple surrogate realizations? I'm guessing that the simulated annealing process should also generate time series that are phase-randomized in the same manner as for the IAAFT method, but I'm not entirely sure I completely understand the procedure. I'll get back to you once I have studied the paper more closely.
These criteria don't necessarily need to be included at this stage. As long as the method docstring states which parts of the original implementations are included and which parts are not, we should be fine. But I'll have a look at that too when I study the paper --- maybe it's not that complicated to implement? I'll have a look! |
@felixcremer thanks for the work! My review would be identical with Kristian's so I won't "spam" the same stuff! Unfortunately I don't have the spare time now to look at the papers so I don't feel I can help much further in the details, sorrY! |
Yes this is after I have started it with a short time series. N = 10 As we can see, the increase is nearly O(N^2). Which is also the expected behaviour descriped in the paper.
Yes please, but I have profiled it and most of the time is spent in the computation of the lombscargle periodogram. So we would either need to find a way to compute only the parts of the Lomb-Scargle periodogram which are changed by swapping two values or we would need to reduce the overall number of steps we need to get to compare periodograms. |
@Datseris I'll have a final look at this tomorrow, and hopefully we can merge and finalize the JOSS paper after that. |
@Datseris There are some changes to LombScargle.jl that leads to code not working in this PR. I can't directly add changes to this branch without pushing a new PR to Felix's fork, which is cumbersome, so I'm merging this to master, then creating a new branch with my updates. |
@felixcremer thank you for the PR. Please, when you do PRs in open source software, leave the button "allow commits by maintainers" checked. |
This adds a new method to construct surrogates for irregular ts based on
the lombscargle periodogram as discussed in #12
This misses at the moment tests, docs and some low hanging speed ups.
I am going to add docstrings and try to make it faster tomorrow.
I tried it visually and this seems to give reasonable surrogates, but unfortunately, it is really slow at the moment.
I am sure, that we can get a little bit faster by reusing some memory.
The commented code was an attempt to fully implement the algorithm as described in
Schmitz and Schreiber. But this destroys the convergence of the algorithm.
This also doesn't yet implement the smarter drawing strategies described in the paper.