libMontyML.py

read_volcano_def():

reads volcano_def.csv into a DataFrame
strips left space from column headings
returns DataFrame

build_master_event_catalog():

reads and concatenates all reawav_MVOE_YYYYMM.csv files, sets index to filetime and sorts.
for mainclass 'R' and 'D', set subclass=mainclass, then drop mainclass
initialize columns for each subclass in subclasses_for_ML (100% for original subclass, 0% elsewhere)
initialize weight to 3, and checked, split, delete and ignore to False
return DataFrame and save to CSV file

parse_STATION0HYP():

selects lines of length 25, reads station name and coordinates
creates a DataFrame with columns name, lat, lon, elev

get_weighted_fingerprints():

calls _select_best_events() to build a dict of the best N events of each class
loops over subclasses_in_ML, selecting that element of the dict

if at least 30 events, initialize a fingerprint DataFrame, with columns mean, std, 25%, 50%, 75%

computes these stats with a weighted statistical model using statsmodels.stats.weightstats.DescrStatsW() with events weighted using weight*subclass_percentage

returns fingerprints dict of DataFrames

save_fingerprints() just saves fingerprints dict to CSV files.

qc_event():

calls _select_next_event(), which examines checked events, and picks the least common label from subclasses_for_ML. Then it extracts all matching unchecked events, and ranks them based on detection_quality, snr, and quality, in that order. Also multiples by weight, but that should always be 3 for unchecked events. Sorts in descending order and subsets the DataFrame to the first row and returns it.
loops over all rows (but I think there should only be 1):

reads the picklefile, applies 0.5-25.0 Hz, 4 pole filter, and calls deconvolve_instrument_response(). This loops over each trace and, if the picklefile is not already in m/s units, it calls load_mvo_inventory() on the CAL directory which tries to return the corresponding Inventory, and then tr.remove_response() is called and then units updated.

computes amplitude and energy for each tr, and stores these in dfenergy DataFrame. Could just call an existing method here.

oddly a suggested_weight is then computed, but it appears to be log10 of energy/2 of the first tr if that tr is in Counts, otherwise None. So why bother correcting the data? And the focus on st[0] seems arbitrary. *> plot event

add station locations to tr.stats in st by calling add_station_locations() and then call plot_amplitude_locations()

print info including row, tracedf (id, medianF, bw_min, peakF, bw_max, band_ratio (1-6, 6-11), kurtosis)

guess label based on fingerprints and row by calling _guess_subclass()

show Seisan subclasses (from VOLCANO.DEF)

parse input event percentages and weight, updating DataFrame accordingly (or q/s/I/d to quit, split, ignore or delete)

returns the DataFrame

remove_marked_events(): subsets the catalog DataFrame to eliminate rows with delete, ignore or split flags set to True. returns new DataFrame

to_AAA():

subsets catalog DataFrame on checked events
eliminates labels with less than 10 events, those remaining are included_subclasses, and merged into dfAAA
columns renamed: twin->length, new_subclass->class
subsetted on columns class, year, month, day, hour, minute, second, length, path
WAV-files optionally copied to /new/path/class/WAVfile
dfAAA written to CSV

report_checked_events():

subsets df on checked events
reports:

number of checked events

number of classified events after those flagged to ignore, delete or split removed

number of events by label in subclasses_for_ML

total number of events matching subclasses_for_ML

total number of reclassified events (different label than original)

total number of events that were correctly classified

error rate

Others:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libMontyML.py

Clone this wiki locally