Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix(commongrid): fix bugs and improve
compute_NASC
using flox (#1167)
* chore(deps): add flox dependency >=0.7.2 * fix(commongrid): fixes 'compute_MVBS' so it can work better and scale Under the hood, binning along ping time and echo range now uses flox. This allows for scalability and more community-maintained. * docs: add small code comment * refactor: change how ping_time index is retrieved * refactor: remove for loop for channel * test(mvbs): add mock Sv datasets and tests for dims (#2) Note that @leewujung also changed mean to nanmean for skipping NaNs in each bin. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: change dask to numpy Changed the use of dask for log10 to numpy instead since numpy can also handle dask array inputs properly. * feat: Add method argument Added 'method' argument to 'get_MVBS_along_channels' and also expose additional keyword arguments control for flox. * fix(commongrid): Fixed to include lat lon Fixed 'compute_MVBS' function to now include latitude and longitude if the variables exists in the Sv dataset. Additionally, the flox method and keyword arguments are now exposed within the 'compute_MVBS' function. Ref: Issue #1002 * refactor: Set defaults to recommended After some investigation, @lsetiawan concluded that at this time the method 'map-reduce', engine 'numpy', and reindex True works the best, so this is now set as default. Also, getting echo range maximum is through direct data slicing rather than computation. * feat(commongrid): Add 'range_var' argument to 'compute_MVBS' Added a new argument 'range_var' so that user can set the range variable to perform binning with. There are 2 options of 'echo_range' and 'depth': - 'echo_range': When this is set, variable 'water_level' is now included in the resulting MVBS dataset - 'depth': A check is in place to ensure that this variable exists before moving forward and use this to perform range binning. Ref: Issue #1002 * fix: Add missing attributes for lat lon * test: Update test to use random generator * fix: Add case for no 'water_level' Added a case for dataset that doesn't have water level variable. * test(nasc): Remove 'compute_NASC' import to avoid failure * fix: Removed assumption on echo range max Reverted back the echo range max computation to computing on the fly since there may be some NaN values. * test: Extract api test and add markings Extracted fixtures to conftest.py for commongrid. Additionally, clean up unused functions and mark tests b/w unit and integration. Added a new test module called 'test_api.py' for 'commongrid.api'. * test: Add latlon test for 'compute_MVBS' Added a test for Sv dataset that contains latitude and longitude going through 'compute_MVBS' to ensure that those variables gets propagated through. Ref: #1002 * test: Add small get_MVBS_along_channels test Added test for 'get_MVBS_along_channels' with either 'depth' as the 'range_var' or checking for 'has_positions' is True or False. * refactor: Integrate suggested changes Integrated suggested changes from review such as additional comments in code, fixing some variable names, and extracting out the lin2log and log2lin functions. Additionally, now echopype imports pint library to start having unit checks in the input for compute_MVBS. * test: Added check for position values * test: Update range_meter_bin to strings * test: Added 'compute_MVBS' values test * Update echopype/tests/utils/test_processinglevels_integration.py compute_MVBS now should preserve the processing level attributes. So, test for presence rather than absence * test: Add 'nan' sprinkles Sprinkled 'nan' values all over 'echo_range' to ensure that computed values from 'compute_MVBS' doesn't take into account the 'nan'. Added check for the expected distribution of 'nan' in the resulting array. Ref: #1124 (comment) * revert: Revert the use of 'pint' Removed dependency to 'pint' and use simple regex to ensure that 'range_bin' input is unit 'm'. Renamed 'range_meter_bin' argument to 'range_bin'. Ref: #1124 (comment) * feat: Allow 'range_bin' to have space * fix: Apply suggestions from code review Applied fix for regex not capturing decimal values by @emiliom Ref: https://github.com/OSOceanAcoustics/echopype/pull/1124/files#r1320422121 Co-authored-by: Emilio Mayorga <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test: Fix test text for wrong unit * test: Remove the 'e.g.' part on pytest Removed the part with '(e.g., '10m')' since it's messing up pytests regex matching. * revive the function to make changes easier to see * add TODOs * add computation steps * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused _lin2log * test: Remove remnant for test_ek.py * refactor: Extract range_bin parsing and add close arg Extracts out the 'range_bin' string to float into a private function. Additionally now there's a fine tune argument for bin close edges so user can specify either close is 'left' or 'right'. Bins are converted to pandas interval index before passing into 'get_MVBS_along_channels'. * refactor: Update arg types to include interval index Added argument type options for 'range_interval' and 'ping_interval' to also be interval index. * test: Update tests to have brute force creation Changed mock mvbs to be created by doing brute force rather than hard coding. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test: Fix brute force mvbs gen Fixes the generation of expected mvbs with brute force as well as tweaks to mvbs_along_channel test. * chore: Clean up old code for doing compute MVBS Removes old code that perfoms compute_MVBS since now we've switched over to flox * chore(pytest): Added custom markers 'unit' and 'integration' * docs: Update docstring for `compute_MVBS` Added options for literal arguments * refactor: Change 'parse_range_bin' to 'parse_x_bin' Make bin parsing to be more general by making it to 'parse_x_bin'. * refactor: Initial unification of MVBS and NASC Added setup and validate function for shared checks between compute MVBS and NASC so only unique checks are in its individual function. * fix typo when porting from notebook * correct attribute units from m to nmi * refactor: Add typehints and use method * feat: Add get_x_along_channels Added 'get_x_along_channels' function that generalizes the reduction routines from 'get_MVBS_along_channels'. This now removes the old function in mvbs.py module. Additionally, uses of 'get_MVBS_along_channels' has been removed from the test and code for 'compute_MVBS'. * feat: Implement new 'compute_NASC' Use 'get_x_along_channels' for 'compute_NASC' and turn on old 'test_nasc.py' for initial nasc testing * test: Renamed and moved get_x_along_channels test * fix: Use 'ffill' and 'bfill' Fixes the 'FutureWarning' coming from pandas since as of pandas version 2.1.0 the 'method' argument for 'fillna' is deprecated. Ref: #1167 (comment) * feat: Allow import 'compute_NASC' from 'commongrid' module * fix: Fix bug on setup and validate and test Fixes bug on 'setup_and_validate' during variable checks. Also added simple testing for values from flox vs echoview. * test: Update simple NASC integration test * test: Add brute force values test for NASC * chore: Apply suggestions from code review Co-authored-by: Wu-Jung Lee <[email protected]> * refactor: Extract position reduction * refactor: Separate sv mean and raw computations * test: Remove empty test_nasc.py * docs: Update docs for functions * refactor: Move helper funcs to utils.py * add L4 processing level to compute_NASC --------- Co-authored-by: Landung 'Don' Setiawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Emilio Mayorga <[email protected]>
- Loading branch information