Skip to content

Commit

Permalink
Merge pull request #1 from pmhalvor/add-detection-stage
Browse files Browse the repository at this point in the history
Add detection stage
  • Loading branch information
pmhalvor committed Sep 21, 2024
2 parents 0348b3b + 8625848 commit b20b7bc
Show file tree
Hide file tree
Showing 15 changed files with 650 additions and 54 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Repo specific ignores
audio/

plots/

# Python basic ignores
# Byte-compiled / optimized / DLL files
Expand Down
72 changes: 58 additions & 14 deletions docs/ladr/LADR_0001_lightweight_detections.md
Original file line number Diff line number Diff line change
@@ -1,38 +1,51 @@
# Lightweight Detection Mechanisms

This doc's purpose is to consider the different options for simple whale vocalization detection mechanisms.
The chosen mechanism will sift through the large audio data, to find the most interesting parts to feed into the machine learning model.

## Background
The purpose of this pipeline to is efficiently detect vocalizations from whales encountered on [HappyWhale](https://happywhale.com/).
The main goal of this pipeline to is efficiently detect vocalizations from whales encountered on [HappyWhale](https://happywhale.com/).
Since audio data is notoriously large, we want to quickly find which chunks of audio are most important to look at, before classifying them via the [humpback_whale model](https://tfhub.dev/google/humpback_whale/1).

Initially, the data that does not make it past the filter will not be fed through the machine learning model, in order to keep costs down.
This means, we need a filtering mechanism that is "generuous" in what it flags, but not too generous that it flags everything.

I'm still learning about different options, so this doc will be updated as I learn more.

## Options

### Energy filter
Simplest of filters. Just measures peaks in audio signal, above a specified threshold.
We would need to normalize then threshold the audio signal to detect peaks.
Could also do a root mean square (RMS) over a short window to detect high energy sections.

#### Pros
- very lightweight
- easy to implement

#### Cons
- too much noise can make it through
- too much noise will make it through
- not very specific
- prioritizes loudness over frequency
- sounds from a distance will likely not be detected
- too rudiementary alone, but good to combine w/ frequency filters for example


### Butterworth Bandpass Filter
[Wikipedia](https://en.wikipedia.org/wiki/Butterworth_filter)
### [Butterworth Passband Filter](https://en.wikipedia.org/wiki/Butterworth_filter)

Filters out audio that does not contain a certain frequency range.
Only allows a particular band of frequencies to pass through.
These are determined via the filter's order and cutoff frequencies, low and high.
The order of the filter determines how steep the roll-off is, i.e. how "boxy" the filter is.
The higher the order, the steeper the roll-off, and the sharper the corners of the top of the filter.

![bandpass filter](https://upload.wikimedia.org/wikipedia/commons/thumb/f/f6/Bandwidth.svg/320px-Bandwidth.svg.png)

For our use case, we want to find audio with frequencies matching expected whale vocalization frequencies.
If the audio over a specified time window contains frequencies inside of the band, it is flagged as a detection.
![detections](../../img/detections.png)

![alt text](https://upload.wikimedia.org/wikipedia/commons/thumb/f/f6/Bandwidth.svg/320px-Bandwidth.svg.png)
To get this most out of this type of filter, we would likely need to use a handful of them, each focusing on their own frequency range.

#### Pros
- more specific than energy filter
Expand All @@ -42,29 +55,57 @@ These are determined via the filter's order and cutoff frequencies, low and high
- easy to implement
- can be used together w/ other filtering methods


#### Cons
- room for improvement on specificity
- fixed window size (needs to be tuned)
- assumes a certain frequency range is the most important
- disregards harmonics
- clicks may not be detected
- different ages/individuals may have different frequency ranges (?)
- not great at detecting sounds from a distance
- not great at detecting vocalizations from a distance


### [Chebyshev Passband Filter](https://en.wikipedia.org/wiki/Chebyshev_filter)
Slightly more complex than the Butterworth filter, but with a steeper roll-off.
The Chebyshev filter has a ripple in the passband, which can be tuned to be more or less aggressive.
It can also handle removing specific frequencies in the stopband, for example a ship's engine noise.

![chebyshev](https://upload.wikimedia.org/wikipedia/commons/thumb/b/b7/ChebyshevII_response-en.svg/2880px-ChebyshevII_response-en.svg.png)


#### Pros
- more specific than Butterworth
- can be tuned to specific frequencies
- can be used together w/ other filtering methods
- can remove specific frequencies

#### Cons
- more computationally expensive than Butterworth (I don't think it is by too much though. Need to test)
- more difficult to interpret
- no experience with it
- ...


### [Spectrogram](https://en.wikipedia.org/wiki/Spectrogram)
A [short-time Fourier transform](https://en.wikipedia.org/wiki/Short-time_Fourier_transform), which visualizes the frequencies of a signal as it varies with time.
This is a 3D representation of the audio signal, where the x-axis is time, y-axis is frequency, and color is energy of that frequency.

![spectrogram](../../img/spectrogram.png)

### Spectrogram
A visual representation of the spectrum of frequencies of a signal as it varies with time.
This is a 2D representation of the audio signal, where the x-axis is time, y-axis is frequency, and color is amplitude.

#### Pros
- can be used to detect harmonics
- can be used to detect clicks
- can be used to detect sounds from a distance
- can be used to detect multiple species
- can be calibrated to find specific frequencies
- energy threshold can be applied to find exact time-range of matching frequencies
- might need to be dynamic to adjust for different distances
- visually intuitive

#### Cons
- computationally expensive
- not lightweight
- more difficult to work with (2D data)
- slightly more computationally expensive (still less than model)


### Distilled Student-Teacher model
We could set up a lightweight NN classifier
Expand Down Expand Up @@ -96,6 +137,8 @@ any whale prescence, i.e. most of the time.
Final option is to just directly use the model on the data surronding a encounter.
This is the most expensive option, but also the most accurate.

![model results](../../img/model_results.png)

#### Pros
- most accurate
- "smartest" filter
Expand All @@ -106,6 +149,7 @@ This is the most expensive option, but also the most accurate.


## Decision
Initially, I will use the Butterworth Bandpass Filter, as it is the most lightweight and specific of the options.
In my brainstorming notebook, I used the Butterworth Bandpass Filter, since it was the simplest filter of a cert
The pipeline should maybe be built to allow for easy swapping of filters via the config file, which will enable easier experimentation at some later time.
From this, I'll gather results, and see if going straight to model would be better.
This doc will be updated with the results of the experiment.
80 changes: 80 additions & 0 deletions examples/butterworth.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
"""
Example provided at: https://scipy-cookbook.readthedocs.io/items/ButterworthBandpass.html
"""
from scipy.signal import butter, lfilter, sosfilt

def butter_bandpass(lowcut, highcut, fs, order=5, output="ba"):
nyq = 0.5 * fs
low = lowcut / nyq
high = highcut / nyq
return butter(order, [low, high], btype='band', output=output)


def butter_bandpass_filter(data, lowcut, highcut, fs, order=5, output="sos"):
butter_values = butter_bandpass(lowcut, highcut, fs, order=order, output=output)
if output == "ba":
b, a = butter_values
y = lfilter(b, a, data)
elif output == "sos":
sos = butter_values
y = sosfilt(sos, data)
return y


def run():
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import freqz

# Sample rate and desired cutoff frequencies (in Hz).
fs = 5000.0
lowcut = 600.0
highcut = 1250.0

# Plot the frequency response for a few different orders.
plt.figure(1)
plt.clf()
for order in [2, 4, 6]: #[3, 6, 9]:
b, a = butter_bandpass(lowcut, highcut, fs, order=order, output="ba")
w, h = freqz(b, a, worN=2000)
plt.plot((fs * 0.5 / np.pi) * w, abs(h), label="order = %d" % order)

plt.plot([0, 0.5 * fs], [np.sqrt(0.5), np.sqrt(0.5)],
'--', label='sqrt(0.5)')
plt.xlabel('Frequency (Hz)')
plt.ylabel('Gain')
plt.grid(True)
plt.legend(loc='best')

# Filter a noisy signal.
T = 0.05
nsamples = int(T * fs)
t = np.linspace(0, T, nsamples, endpoint=False)
a = 0.05 # sinewave amplitude (used here to emphasize our desired frequency)
f0 = 600.0
x = 0.1 * np.sin(2 * np.pi * 1.2 * np.sqrt(t))
x += 0.01 * np.cos(2 * np.pi * 312 * t + 0.1)
x += 0.01 * np.cos(2 * np.pi * 510 * t + 0.1) # another frequency in the lowcut and highcut range
x += 0.01 * np.cos(2 * np.pi * 520 * t + 0.1) # another frequency in the lowcut and highcut range
x += 0.01 * np.cos(2 * np.pi * 530 * t + 0.1) # another frequency in the lowcut and highcut range
x += 0.01 * np.cos(2 * np.pi * 540 * t + 0.1) # another frequency in the lowcut and highcut range
x += 0.01 * np.cos(2 * np.pi * 550 * t + 0.1) # another frequency in the lowcut and highcut range
x += 0.01 * np.cos(2 * np.pi * 1200 * t + 0.1) # another frequency in the lowcut and highcut range
x += a * np.cos(2 * np.pi * f0 * t + .11)
x += 0.03 * np.cos(2 * np.pi * 2000 * t)
plt.figure(2)
plt.clf()
plt.plot(t, x, label='Noisy signal')

y = butter_bandpass_filter(x, lowcut, highcut, fs, order=6)
plt.plot(t, y, label='Filtered signal')
plt.xlabel('time (seconds)')
plt.hlines([-a, a], 0, T, linestyles='--')
plt.grid(True)
plt.axis('tight')
plt.legend(loc='upper left')

plt.show()


run()
15 changes: 7 additions & 8 deletions examples/notebooks/adaptedpacificsounddetecthumpbacksong.py
Original file line number Diff line number Diff line change
Expand Up @@ -461,7 +461,6 @@ def run(

"""## Filters
### Me
I want to play around with different filters for the whale noises.
Applying the filter to the signal will allow me to emphasize some signals, and mitigate others.
"""
Expand All @@ -477,29 +476,29 @@ def run(
sample_rate = sample_rate,
)

def low_pass(
def average_filter(
signal = signal,
factor = 0.01,
size = 100,
factor = 0.1,
):
low_pass_filter = np.ones(size)*factor
filtered_signal = np.convolve(signal, low_pass_filter)
average_array = (np.ones(size)/size)*factor
filtered_signal = np.convolve(signal, average_array)
return filtered_signal


factor = 0.1
size = 100

filtered_signal = low_pass(signal, factor=factor, size=size)
filtered_signal = average_filter(signal, factor=factor, size=size)
# plot_play(filtered_signal, title=f"Low-pass filter - factor:{factor} size:{size}")

factor = 0.1
size = 10

filtered_signal = low_pass(signal, factor=factor, size=size)
filtered_signal = average_filter(signal, factor=factor, size=size)
# plot_play(filtered_signal, title=f"Low-pass filter - factor:{factor} size:{size}")

"""### Claude"""
"""### Ask an agent"""

(
sample_start,
Expand Down
Binary file added img/detections.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/model_results.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/spectrogram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
22 changes: 16 additions & 6 deletions src/pipeline/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,32 @@

from apache_beam.options.pipeline_options import PipelineOptions, SetupOptions
from stages.search import GeometrySearch
from stages.audio import RetrieveAudio, WriteAudio
from stages.audio import RetrieveAudio, WriteAudio, WriteSiftedAudio
from stages.sift import Butterworth

from config import load_pipeline_config
config = load_pipeline_config()

def run():
# Initialize pipeline options
pipeline_options = PipelineOptions()
pipeline_options.view_as(SetupOptions).save_main_session = True
args = {
"start": config.input.start,
"end": config.input.end
}

with beam.Pipeline(options=pipeline_options) as p:
input_data = p | "Create Input" >> beam.Create([{'start': '2016-12-21T00:30:0', 'end':"2016-12-21T00:40:0"}])
search_results = input_data | "Run Geometry Search" >> beam.ParDo(GeometrySearch())
audio_results = search_results | "Retrieve Audio" >> beam.ParDo(RetrieveAudio())
# filtered_audio = audio_results | "Filter Frequency" >> FilterFrequency()
input_data = p | "Create Input" >> beam.Create([args])
search_results = input_data | "Run Geometry Search" >> beam.ParDo(GeometrySearch())

audio_results = search_results | "Retrieve Audio" >> beam.ParDo(RetrieveAudio())
audio_files = audio_results | "Store Audio (temp)" >> beam.ParDo(WriteAudio())

sifted_audio = audio_results | "Sift Audio" >> Butterworth()
sifted_audio_files = sifted_audio | "Store Sifted Audio" >> beam.ParDo(WriteSiftedAudio("butterworth"))

# For debugging, you can write the output to a text file
audio_files = audio_results | "Store Audio (temp)" >> beam.ParDo(WriteAudio())
# audio_files | "Write Audio Output" >> beam.io.WriteToText('audio_files.txt')
# search_results | "Write Search Output" >> beam.io.WriteToText('search_results.txt')

Expand Down
25 changes: 17 additions & 8 deletions src/pipeline/config.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
pipeline:
general:
verbose: true
debug: true

input:
start: "2016-12-21T00:30:00"
Expand Down Expand Up @@ -28,16 +29,24 @@ pipeline:
source_sample_rate: 16000
margin: 30 # TODO set to 900 # seconds
offset: 13 # TODO set to 0 # hours
output_path_template: "data/audio/{year}/{month:02}/{filename}"
skip_existing: false

detection_filter:
highcut: 1500
lowcut: 50
order: 10
frequency_threshold: 0.015
output_path_template: "data/audio/raw/{year}/{month:02}/{filename}"
skip_existing: false # if true, skip downstream processing of existing audio files (false during development)

sift:
output_path_template: "data/audio/{sift}/{year}/{month:02}/{filename}"
max_duration: 600 # seconds
plot: true
plot_path_template: "data/plots/{sift}/{year}/{month:02}/{day:02}/{plot_name}.png"
window_size: 512

# Specific sift-mechanism parameters
butterworth:
highcut: 1500
lowcut: 50
order: 5
output: "sos" # "sos" or "ba"
sift_threshold: 0.015

model:
url: https://tfhub.dev/google/humpback_whale/1
model_sample_rate: 10000
Expand Down
Loading

0 comments on commit b20b7bc

Please sign in to comment.