-
Notifications
You must be signed in to change notification settings - Fork 0
libMontyML.py
Glenn Thompson edited this page Nov 12, 2021
·
4 revisions
read_volcano_def():
- reads volcano_def.csv into a DataFrame
- strips left space from column headings
- returns DataFrame
build_master_event_catalog():
- reads and concatenates all reawav_MVOE_YYYYMM.csv files, sets index to filetime and sorts.
- for mainclass 'R' and 'D', set subclass=mainclass, then drop mainclass
- initialize columns for each subclass in subclasses_for_ML (100% for original subclass, 0% elsewhere)
- initialize weight to 3, and checked, split, delete and ignore to False
- return DataFrame and save to CSV file
parse_STATION0HYP():
- selects lines of length 25, reads station name and coordinates
- creates a DataFrame with columns name, lat, lon, elev
get_weighted_fingerprints():
- calls _select_best_events() to build a dict of the best N events of each class
- loops over subclasses_in_ML, selecting that element of the dict
- if at least 30 events, initialize a fingerprint DataFrame, with columns mean, std, 25%, 50%, 75%
- computes these stats with a weighted statistical model using statsmodels.stats.weightstats.DescrStatsW() with events weighted using weight*subclass_percentage
- returns fingerprints dict of DataFrames
save_fingerprints() just saves fingerprints dict to CSV files.
qc_event():
- calls _select_next_event(), which examines checked events, and picks the least common label from subclasses_for_ML. Then it extracts all matching unchecked events, and ranks them based on detection_quality, snr, and quality, in that order. Also multiples by weight, but that should always be 3 for unchecked events. Sorts in descending order and subsets the DataFrame to the first row and returns it.
- loops over all rows (but I think there should only be 1):
- reads the picklefile, applies 0.5-25.0 Hz, 4 pole filter, and calls deconvolve_instrument_response(). This loops over each trace and, if the picklefile is not already in m/s units, it calls load_mvo_inventory() on the CAL directory which tries to return the corresponding Inventory, and then tr.remove_response() is called and then units updated.
- computes amplitude and energy for each tr, and stores these in dfenergy DataFrame. Could just call an existing method here.
- oddly a suggested_weight is then computed, but it appears to be log10 of energy/2 of the first tr if that tr is in Counts, otherwise None. So why bother correcting the data? And the focus on st[0] seems arbitrary. *> plot event
- add station locations to tr.stats in st by calling add_station_locations() and then call plot_amplitude_locations()
- print info including row, tracedf (id, medianF, bw_min, peakF, bw_max, band_ratio (1-6, 6-11), kurtosis)
- guess label based on fingerprints and row by calling _guess_subclass()
- show Seisan subclasses (from VOLCANO.DEF)
- parse input event percentages and weight, updating DataFrame accordingly (or q/s/I/d to quit, split, ignore or delete)
- returns the DataFrame
remove_marked_events(): subsets the catalog DataFrame to eliminate rows with delete, ignore or split flags set to True. returns new DataFrame
to_AAA():
- subsets catalog DataFrame on checked events
- eliminates labels with less than 10 events, those remaining are included_subclasses, and merged into dfAAA
- columns renamed: twin->length, new_subclass->class
- subsetted on columns class, year, month, day, hour, minute, second, length, path
- WAV-files optionally copied to /new/path/class/WAVfile
- dfAAA written to CSV
report_checked_events():
- subsets df on checked events
- reports:
- number of checked events
- number of classified events after those flagged to ignore, delete or split removed
- number of events by label in subclasses_for_ML
- total number of events matching subclasses_for_ML
- total number of reclassified events (different label than original)
- total number of events that were correctly classified
- error rate
Others: