-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Online beat trackers #314
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I am not sure about some defaults (e.g. the 5 frames look back or a threshold of 0.1), these should at least be documented somehow or reconsidered completely -- e.g. the 5 frames could maybe expressed as the minimum interval or something like that.
madmom/features/beats.py
Outdated
buffer = self.buffer(activation) | ||
# create an online interval histogram | ||
histogram = self.tempo_estimator.online_interval_histogram( | ||
activation, reset=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should pass the reset
argument to the tempo_estimator
as well, i.e. reset=reset
. Otherwise the beat tracker resets its state, but the tempo estimation still keeps its old state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, actually I think it's ok the way it is? The tempo_estimator gets reset inside the reset()
method of the BeatTracker and might be called whenever process_online()
is called.
If I want to compute a whole song then process_online()
gets called only once with all the activations and reset=True
. If I would use reset=reset
here then while iterating through all activations it would reset the tempo_estimator for each activation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I missed the resetting inside reset()
. Anyways, I still think passing reset=reset
is the cleaner solution than calling self.tempo_estimator.reset()
. When iterating over activations frame by frame, reset must be set to false of course -- and this is exactly what is done in online mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, I forgot to pass the online=True
parameter to process_single()
(I'm calling stuff with a script). Nevermind, works now ;)
madmom/features/beats.py
Outdated
activation, reset=False) | ||
# get the dominant interval | ||
interval = self.tempo_estimator.dominant_interval(histogram) | ||
# compute current and next beat times |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is a bit misleading, especially the part of next beat. If I get it right it is the first position which is possible to be the next beat, right?
madmom/features/beats.py
Outdated
# for every frame the tempo may change and therefore the last beat | ||
# can easily be missed. | ||
if len(detections) > 0 and detections[-1] >= len(buffer) - 5 and \ | ||
buffer[detections[-1]] > self.threshold: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indentation is weird and the if-statement is very long. I'd prefer to have multiple logical expressions and then test for them, e.g.:
cond_1 = detections[-1] >= len(buffer) - 5
cond_2 = buffer[detections[-1]] > self.threshold
if detections and cond_1 and cond_2:
Of course, more meaningful names would be nice, but you get the idea. What I still don't like is some hard coded values such as -5
. Why 5 frames?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tested different values here and this just seemed to work best. But I know what you mean... maybe i can find something more rational. For the threshold the intention was that we probably don't want false positive predictions on silent signals. This also seem to improve the dataset results a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was never questioning the rationale behind using a threshold, e.g.DBNBeatTrackingProcessor
uses it for the very same reason 😏. Back then I added the threshold for the DBN stuff and did not back-port it to the other methods. Maybe we should do so now and add it to the other methods (after evaluating it's positive effect of course).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more notes regarding the threshold:
- What I do is to determine the first/last frame where the activation is above/below the threshold and use only the segment in between. Although this works quite nicely, I am not sure if this is not simply some sort of adaptation towards the annotations...
- The same could be done for multiple segments, not just one per song. Although this sounds logical and straight forward, I am not sure if this segmentation helps in general. We must be aware of the many songs which have segments without any perceivable beats where counting continues. For these cases splitting into segments doesn't make sense at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, cool! Yeah using only "active" segments seems like a good idea. Maybe the distance between those active segments gives a hint whether it should stop tracking or not.
For online it's a bit different anyway because we don't know how long the break will be so we have to decide on the spot (or maybe with a bit of delay).
# display tempo | ||
display.append('| %5.1f | ' % float(self.fps * 60 / interval)) | ||
sys.stderr.write('\r%s' % ''.join(display)) | ||
sys.stderr.flush() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider refactoring the visualisation stuff as a mixin which provides a visualize()
method. BeatTrackingProcessor
and DBNBeatTrackingProcessor
could then inherit from this mixin and simply call visualize()
whenever needed. The mixin should have/keep its own state and counters. Some more notes: we should consider using constants to define how long the beats are displayed (e.g. self.beat_counter = DISPLAY_DURATION
) or the inertia of the RNN beat activation strength (e.g. DISPLAY_INERTIA
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be very nice! Every BeatTracker could show the activation strength, the tempo and whether the current frame is a beat. My only concern is that different trackers may want to show different additional information (like the DBNTracker does with the state space).
tests/test_features_beats.py
Outdated
@@ -79,6 +79,25 @@ def test_process(self): | |||
self.assertTrue(np.allclose(beats, [0.11, 0.45, 0.79, 1.13, 1.47, | |||
1.81, 2.15, 2.49])) | |||
|
|||
def test_process_online(self): | |||
processor = BeatTrackingProcessor(fps=sample_blstm_act.fps, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor note: use fps=sample_lstm_act.fps
instead.
Thx for all your comments, will work through all of them and then amend ;) |
fb6c54f
to
cb466ac
Compare
@@ -390,6 +390,51 @@ def recursive(position): | |||
return np.array(positions) | |||
|
|||
|
|||
def detect_beats_with_ccf(activations, interval, look_aside=0.2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am actually not sure if you want to keep this method :) It is completely interchangeable with detect_beats() but seems to produce results which are a little bit worse. On the other hand its much faster. Maybe its an option for the BeatDetector (with constant tempo).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem you are facing here is the fact that many musical pieces simply don't have a constant tempo.
Although you're allowing some deviation from a constant tempo, this is only considered for each beat individually, i.e. the tempo remains stable and only some jitter is allowed. If one beat is a bit earlier (faster), the position of the next is not based on the current beat, but rather on the previously pre-computed positions.
detect_beats
on the other hand allows also some deviation from the main tempo, but the position of the next beat is based on this position rather than the pre-computed ones. This makes this method more flexible, but also slower.
If detect_beats_with_ccf
performs inferior to detect_beats
on most datasets I don't see a point for including it, even if it is faster. I'd rather speed up detect_beats
if speed is really an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, i think the same! At least i can use it for my thesis and evaluate it :)
I'm very curious now about the MultiAgent approach because, similar to detect_beats
it always jumps from beat to beat but considers multiple tempo hypothesis and should be a very good fit for online mode 😋
cb466ac
to
9d85686
Compare
48d0cf8
to
886937c
Compare
37e83bb
to
cbdbe8c
Compare
ce27bc1
to
67fb095
Compare
b169c9b
to
31c4d33
Compare
2aa06d6
to
4fd7ab9
Compare
8777d30
to
ceb5c74
Compare
56eb679
to
55d7f1f
Compare
dcdbb8b
to
1c640e5
Compare
1c640e5
to
be97b36
Compare
I merged #292, please rebase your branch on master. |
55d7f1f
to
a192ef0
Compare
a192ef0
to
5ba9021
Compare
Ok, rebased this branch and applied your stuck-appveyor fixes to the tests. I think the -0.1% Coverage is due to the CCF-Commit (which can be omitted). Anyway, would be awesome if you could take it from here \m/ |
This PR extends The BeatTracker with online capability based on #292.
Also a simple CCF beat detecting method for experimenting was added.