Merge pull request #1 from pmhalvor/add-detection-stage

Add detection stage
pmhalvor · Sep 21, 2024 · b20b7bc · b20b7bc
2 parents 0348b3b + 8625848
commit b20b7bc
Show file tree

Hide file tree

Showing 15 changed files with 650 additions and 54 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,6 +1,6 @@
 # Repo specific ignores
 audio/
-
+plots/
 
 # Python basic ignores
 # Byte-compiled / optimized / DLL files

diff --git a/docs/ladr/LADR_0001_lightweight_detections.md b/docs/ladr/LADR_0001_lightweight_detections.md
@@ -1,38 +1,51 @@
 # Lightweight Detection Mechanisms
 
 This doc's purpose is to consider the different options for simple whale vocalization detection mechanisms.
+The chosen mechanism will sift through the large audio data, to find the most interesting parts to feed into the machine learning model.
 
 ## Background
-The purpose of this pipeline to is efficiently detect vocalizations from whales encountered on [HappyWhale](https://happywhale.com/). 
+The main goal of this pipeline to is efficiently detect vocalizations from whales encountered on [HappyWhale](https://happywhale.com/). 
 Since audio data is notoriously large, we want to quickly find which chunks of audio are most important to look at, before classifying them via the [humpback_whale model](https://tfhub.dev/google/humpback_whale/1).
 
 Initially, the data that does not make it past the filter will not be fed through the machine learning model, in order to keep costs down. 
 This means, we need a filtering mechanism that is "generuous" in what it flags, but not too generous that it flags everything.
 
+I'm still learning about different options, so this doc will be updated as I learn more.
+
 ## Options
 
 ### Energy filter
 Simplest of filters. Just measures peaks in audio signal, above a specified threshold.
+We would need to normalize then threshold the audio signal to detect peaks.
+Could also do a root mean square (RMS) over a short window to detect high energy sections. 
 
 #### Pros
 - very lightweight
 - easy to implement
 
 #### Cons
-- too much noise can make it through
+- too much noise will make it through
 - not very specific
 - prioritizes loudness over frequency
 - sounds from a distance will likely not be detected
+- too rudiementary alone, but good to combine w/ frequency filters for example
 
 
-### Butterworth Bandpass Filter
-[Wikipedia](https://en.wikipedia.org/wiki/Butterworth_filter)
+### [Butterworth Passband Filter](https://en.wikipedia.org/wiki/Butterworth_filter)
 
 Filters out audio that does not contain a certain frequency range.
 Only allows a particular band of frequencies to pass through. 
 These are determined via the filter's order and cutoff frequencies, low and high.
+The order of the filter determines how steep the roll-off is, i.e. how "boxy" the filter is.
+The higher the order, the steeper the roll-off, and the sharper the corners of the top of the filter.
+
+![bandpass filter](https://upload.wikimedia.org/wikipedia/commons/thumb/f/f6/Bandwidth.svg/320px-Bandwidth.svg.png)
+
+For our use case, we want to find audio with frequencies matching expected whale vocalization frequencies. 
+If the audio over a specified time window contains frequencies inside of the band, it is flagged as a detection.
+![detections](../../img/detections.png)
 
-![alt text](https://upload.wikimedia.org/wikipedia/commons/thumb/f/f6/Bandwidth.svg/320px-Bandwidth.svg.png)
+To get this most out of this type of filter, we would likely need to use a handful of them, each focusing on their own frequency range.
 
 #### Pros 
 - more specific than energy filter
@@ -42,29 +55,57 @@ These are determined via the filter's order and cutoff frequencies, low and high
 - easy to implement
 - can be used together w/ other filtering methods
 
+
 #### Cons
-- room for improvement on specificity
+- fixed window size (needs to be tuned)
 - assumes a certain frequency range is the most important 
     - disregards harmonics
     - clicks may not be detected
     - different ages/individuals may have different frequency ranges (?)
-- not great at detecting sounds from a distance
+- not great at detecting vocalizations from a distance
+
+
+### [Chebyshev Passband Filter](https://en.wikipedia.org/wiki/Chebyshev_filter)
+Slightly more complex than the Butterworth filter, but with a steeper roll-off.
+The Chebyshev filter has a ripple in the passband, which can be tuned to be more or less aggressive. 
+It can also handle removing specific frequencies in the stopband, for example a ship's engine noise.
+
+![chebyshev](https://upload.wikimedia.org/wikipedia/commons/thumb/b/b7/ChebyshevII_response-en.svg/2880px-ChebyshevII_response-en.svg.png)
+
+
+#### Pros
+- more specific than Butterworth
+- can be tuned to specific frequencies
+- can be used together w/ other filtering methods
+- can remove specific frequencies
+
+#### Cons
+- more computationally expensive than Butterworth (I don't think it is by too much though. Need to test)
+- more difficult to interpret
+- no experience with it
+- ...
+
 
+### [Spectrogram](https://en.wikipedia.org/wiki/Spectrogram)
+A [short-time Fourier transform](https://en.wikipedia.org/wiki/Short-time_Fourier_transform), which visualizes the frequencies of a signal as it varies with time.
+This is a 3D representation of the audio signal, where the x-axis is time, y-axis is frequency, and color is energy of that frequency.
+
+![spectrogram](../../img/spectrogram.png) 
 
-### Spectrogram
-A visual representation of the spectrum of frequencies of a signal as it varies with time.
-This is a 2D representation of the audio signal, where the x-axis is time, y-axis is frequency, and color is amplitude.
 
 #### Pros
 - can be used to detect harmonics
 - can be used to detect clicks
 - can be used to detect sounds from a distance
 - can be used to detect multiple species
+- can be calibrated to find specific frequencies
+- energy threshold can be applied to find exact time-range of matching frequencies
+    - might need to be dynamic to adjust for different distances
+- visually intuitive
 
 #### Cons
-- computationally expensive
-- not lightweight
-- more difficult to work with (2D data)
+- slightly more computationally expensive (still less than model)
+
 
 ### Distilled Student-Teacher model
 We could set up a lightweight NN classifier 
@@ -96,6 +137,8 @@ any whale prescence, i.e. most of the time.
 Final option is to just directly use the model on the data surronding a encounter. 
 This is the most expensive option, but also the most accurate.
 
+![model results](../../img/model_results.png)
+
 #### Pros
 - most accurate
 - "smartest" filter
@@ -106,6 +149,7 @@ This is the most expensive option, but also the most accurate.
 
 
 ## Decision
-Initially, I will use the Butterworth Bandpass Filter, as it is the most lightweight and specific of the options.
+In my brainstorming notebook, I used the Butterworth Bandpass Filter, since it was the simplest filter of a cert
+The pipeline should maybe be built to allow for easy swapping of filters via the config file, which will enable easier experimentation at some later time. 
 From this, I'll gather results, and see if going straight to model would be better. 
 This doc will be updated with the results of the experiment.
diff --git a/examples/butterworth.py b/examples/butterworth.py
@@ -0,0 +1,80 @@
+"""
+Example provided at: https://scipy-cookbook.readthedocs.io/items/ButterworthBandpass.html
+"""
+from scipy.signal import butter, lfilter, sosfilt
+
+def butter_bandpass(lowcut, highcut, fs, order=5, output="ba"):
+    nyq = 0.5 * fs
+    low = lowcut / nyq
+    high = highcut / nyq
+    return butter(order, [low, high], btype='band', output=output)
+
+
+def butter_bandpass_filter(data, lowcut, highcut, fs, order=5, output="sos"):
+    butter_values = butter_bandpass(lowcut, highcut, fs, order=order, output=output)
+    if output == "ba":
+        b, a = butter_values
+        y = lfilter(b, a, data)
+    elif output == "sos":
+        sos = butter_values
+        y = sosfilt(sos, data)
+    return y
+
+
+def run():
+    import numpy as np
+    import matplotlib.pyplot as plt
+    from scipy.signal import freqz
+
+    # Sample rate and desired cutoff frequencies (in Hz).
+    fs = 5000.0
+    lowcut = 600.0
+    highcut = 1250.0
+
+    # Plot the frequency response for a few different orders.
+    plt.figure(1)
+    plt.clf()
+    for order in [2, 4, 6]: #[3, 6, 9]:
+        b, a = butter_bandpass(lowcut, highcut, fs, order=order, output="ba")
+        w, h = freqz(b, a, worN=2000)
+        plt.plot((fs * 0.5 / np.pi) * w, abs(h), label="order = %d" % order)
+
+    plt.plot([0, 0.5 * fs], [np.sqrt(0.5), np.sqrt(0.5)],
+             '--', label='sqrt(0.5)')
+    plt.xlabel('Frequency (Hz)')
+    plt.ylabel('Gain')
+    plt.grid(True)
+    plt.legend(loc='best')
+
+    # Filter a noisy signal.
+    T = 0.05
+    nsamples = int(T * fs)
+    t = np.linspace(0, T, nsamples, endpoint=False)
+    a = 0.05  # sinewave amplitude (used here to emphasize our desired frequency)
+    f0 = 600.0
+    x = 0.1 * np.sin(2 * np.pi * 1.2 * np.sqrt(t))
+    x += 0.01 * np.cos(2 * np.pi * 312 * t + 0.1)
+    x += 0.01 * np.cos(2 * np.pi * 510 * t + 0.1)  # another frequency in the lowcut and highcut range
+    x += 0.01 * np.cos(2 * np.pi * 520 * t + 0.1)  # another frequency in the lowcut and highcut range
+    x += 0.01 * np.cos(2 * np.pi * 530 * t + 0.1)  # another frequency in the lowcut and highcut range
+    x += 0.01 * np.cos(2 * np.pi * 540 * t + 0.1)  # another frequency in the lowcut and highcut range
+    x += 0.01 * np.cos(2 * np.pi * 550 * t + 0.1)  # another frequency in the lowcut and highcut range
+    x += 0.01 * np.cos(2 * np.pi * 1200 * t + 0.1) # another frequency in the lowcut and highcut range
+    x += a * np.cos(2 * np.pi * f0 * t + .11)
+    x += 0.03 * np.cos(2 * np.pi * 2000 * t)
+    plt.figure(2)
+    plt.clf()
+    plt.plot(t, x, label='Noisy signal')
+
+    y = butter_bandpass_filter(x, lowcut, highcut, fs, order=6)
+    plt.plot(t, y, label='Filtered signal')
+    plt.xlabel('time (seconds)')
+    plt.hlines([-a, a], 0, T, linestyles='--')
+    plt.grid(True)
+    plt.axis('tight')
+    plt.legend(loc='upper left')
+
+    plt.show()
+
+
+run()
diff --git a/examples/notebooks/adaptedpacificsounddetecthumpbacksong.py b/examples/notebooks/adaptedpacificsounddetecthumpbacksong.py
@@ -461,7 +461,6 @@ def run(
 
 """## Filters
 
-### Me
 I want to play around with different filters for the whale noises.
 Applying the filter to the signal will allow me to emphasize some signals, and mitigate others.
 """
@@ -477,29 +476,29 @@ def run(
     sample_rate = sample_rate,
 )
 
-def low_pass(
+def average_filter(
     signal = signal,
-    factor = 0.01,
     size = 100,
+    factor = 0.1,
 ):
-    low_pass_filter = np.ones(size)*factor
-    filtered_signal = np.convolve(signal, low_pass_filter)
+    average_array = (np.ones(size)/size)*factor
+    filtered_signal = np.convolve(signal, average_array)
     return filtered_signal
 
 
 factor = 0.1
 size = 100
 
-filtered_signal = low_pass(signal, factor=factor, size=size)
+filtered_signal = average_filter(signal, factor=factor, size=size)
 # plot_play(filtered_signal, title=f"Low-pass filter -  factor:{factor} size:{size}")
 
 factor = 0.1
 size = 10
 
-filtered_signal = low_pass(signal, factor=factor, size=size)
+filtered_signal = average_filter(signal, factor=factor, size=size)
 # plot_play(filtered_signal, title=f"Low-pass filter -  factor:{factor} size:{size}")
 
-"""### Claude"""
+"""### Ask an agent"""
 
 (
     sample_start,

diff --git a/img/detections.png b/img/detections.png
diff --git a/img/model_results.png b/img/model_results.png
diff --git a/img/spectrogram.png b/img/spectrogram.png
diff --git a/src/pipeline/app.py b/src/pipeline/app.py
@@ -2,22 +2,32 @@
 
 from apache_beam.options.pipeline_options import PipelineOptions, SetupOptions
 from stages.search import GeometrySearch
-from stages.audio import RetrieveAudio, WriteAudio
+from stages.audio import RetrieveAudio, WriteAudio, WriteSiftedAudio
+from stages.sift import Butterworth
 
+from config import load_pipeline_config
+config = load_pipeline_config()
 
 def run():
     # Initialize pipeline options
     pipeline_options = PipelineOptions()
     pipeline_options.view_as(SetupOptions).save_main_session = True
+    args = {
+        "start": config.input.start,
+        "end": config.input.end
+    }
 
     with beam.Pipeline(options=pipeline_options) as p:
-        input_data =        p               | "Create Input"        >> beam.Create([{'start': '2016-12-21T00:30:0', 'end':"2016-12-21T00:40:0"}])  
-        search_results =    input_data      | "Run Geometry Search" >> beam.ParDo(GeometrySearch())
-        audio_results =     search_results  | "Retrieve Audio"      >> beam.ParDo(RetrieveAudio())
-        # filtered_audio =    audio_results   | "Filter Frequency"    >> FilterFrequency()
+        input_data        = p               | "Create Input"        >> beam.Create([args])  
+        search_results    = input_data      | "Run Geometry Search" >> beam.ParDo(GeometrySearch())
+
+        audio_results     = search_results  | "Retrieve Audio"      >> beam.ParDo(RetrieveAudio())
+        audio_files       = audio_results   | "Store Audio (temp)"  >> beam.ParDo(WriteAudio())
+
+        sifted_audio      = audio_results   | "Sift Audio"          >> Butterworth()
+        sifted_audio_files = sifted_audio   | "Store Sifted Audio"  >> beam.ParDo(WriteSiftedAudio("butterworth"))
 
         # For debugging, you can write the output to a text file
-        audio_files =       audio_results   | "Store Audio (temp)"  >> beam.ParDo(WriteAudio())
         # audio_files     | "Write Audio Output"  >> beam.io.WriteToText('audio_files.txt')
         # search_results  | "Write Search Output" >> beam.io.WriteToText('search_results.txt')
 

diff --git a/src/pipeline/config.yaml b/src/pipeline/config.yaml
@@ -1,6 +1,7 @@
 pipeline:
   general:
     verbose: true
+    debug: true
 
   input:
     start: "2016-12-21T00:30:00"
@@ -28,16 +29,24 @@ pipeline:
     source_sample_rate: 16000
     margin: 30 # TODO set to 900  # seconds  
     offset: 13 # TODO set to 0    # hours
-    output_path_template: "data/audio/{year}/{month:02}/{filename}"
-    skip_existing: false
-
-  detection_filter:
-    highcut: 1500
-    lowcut: 50
-    order: 10
-    frequency_threshold: 0.015
+    output_path_template: "data/audio/raw/{year}/{month:02}/{filename}"
+    skip_existing: false # if true, skip downstream processing of existing audio files (false during development)
+
+  sift:
+    output_path_template: "data/audio/{sift}/{year}/{month:02}/{filename}"
+    max_duration: 600  # seconds
+    plot: true
+    plot_path_template: "data/plots/{sift}/{year}/{month:02}/{day:02}/{plot_name}.png"
     window_size: 512
 
+    # Specific sift-mechanism parameters
+    butterworth:
+      highcut: 1500
+      lowcut: 50
+      order: 5
+      output: "sos"  # "sos" or "ba" 
+      sift_threshold: 0.015
+
   model:
     url: https://tfhub.dev/google/humpback_whale/1
     model_sample_rate: 10000