Skip to content

libMontyML.py

Glenn Thompson edited this page Nov 12, 2021 · 4 revisions

read_volcano_def():

  • reads volcano_def.csv into a DataFrame
  • strips left space from column headings
  • returns DataFrame

build_master_event_catalog():

  • reads and concatenates all reawav_MVOE_YYYYMM.csv files, sets index to filetime and sorts.
  • for mainclass 'R' and 'D', set subclass=mainclass, then drop mainclass
  • initialize columns for each subclass in subclasses_for_ML (100% for original subclass, 0% elsewhere)
  • initialize weight to 3, and checked, split, delete and ignore to False
  • return DataFrame and save to CSV file

parse_STATION0HYP():

  • selects lines of length 25, reads station name and coordinates
  • creates a DataFrame with columns name, lat, lon, elev

get_weighted_fingerprints():

  • calls _select_best_events() to build a dict of the best N events of each class
  • loops over subclasses_in_ML, selecting that element of the dict
  • if at least 30 events, initialize a fingerprint DataFrame, with columns mean, std, 25%, 50%, 75%
  • computes these stats with a weighted statistical model using statsmodels.stats.weightstats.DescrStatsW() with events weighted using weight*subclass_percentage
  • returns fingerprints dict of DataFrames

save_fingerprints() just saves fingerprints dict to CSV files.

qc_event():

  • calls _select_next_event(), which examines checked events, and picks the least common label from subclasses_for_ML. Then it extracts all matching unchecked events, and ranks them based on detection_quality, snr, and quality, in that order. Also multiples by weight, but that should always be 3 for unchecked events. Sorts in descending order and subsets the DataFrame to the first row and returns it.
  • loops over all rows (but I think there should only be 1):
  • reads the picklefile, applies 0.5-25.0 Hz, 4 pole filter, and calls deconvolve_instrument_response(). This loops over each trace and, if the picklefile is not already in m/s units, it calls load_mvo_inventory() on the CAL directory which tries to return the corresponding Inventory, and then tr.remove_response() is called and then units updated.
  • computes amplitude and energy for each tr, and stores these in dfenergy DataFrame. Could just call an existing method here.
  • oddly a suggested_weight is then computed, but it appears to be log10 of energy/2 of the first tr if that tr is in Counts, otherwise None. So why bother correcting the data? And the focus on st[0] seems arbitrary. *> plot event
  • add station locations to tr.stats in st by calling add_station_locations() and then call plot_amplitude_locations()
  • print info including row, tracedf (id, medianF, bw_min, peakF, bw_max, band_ratio (1-6, 6-11), kurtosis)
  • guess label based on fingerprints and row by calling _guess_subclass()
  • show Seisan subclasses (from VOLCANO.DEF)
  • parse input event percentages and weight, updating DataFrame accordingly (or q/s/I/d to quit, split, ignore or delete)
  • returns the DataFrame

remove_marked_events(): subsets the catalog DataFrame to eliminate rows with delete, ignore or split flags set to True. returns new DataFrame

to_AAA():

  • subsets catalog DataFrame on checked events
  • eliminates labels with less than 10 events, those remaining are included_subclasses, and merged into dfAAA
  • columns renamed: twin->length, new_subclass->class
  • subsetted on columns class, year, month, day, hour, minute, second, length, path
  • WAV-files optionally copied to /new/path/class/WAVfile
  • dfAAA written to CSV

report_checked_events():

  • subsets df on checked events
  • reports:
  • number of checked events
  • number of classified events after those flagged to ignore, delete or split removed
  • number of events by label in subclasses_for_ML
  • total number of events matching subclasses_for_ML
  • total number of reclassified events (different label than original)
  • total number of events that were correctly classified
  • error rate

Others: