384 optimize distance to anomaly #423

okolekar · 2024-09-05T07:00:43Z

Hello,
I have added the function "distance_to_anomaly_gdal_ComputeProximity" to reduce the runtime for distance computation. The new function "distance_to_anomaly_gdal_ComputeProximity" is a C++ based API function and hence is faster as compared to "distance_to_anomaly" (which was using the point based scheme) and "distance_to_anomaly_gdal" (which was using the python API for distance computation) function. All the three produce the exact same result which is verified in the python notebook distance_to_anomaly.ipynb.

Below I have attached some graphs inorder showcase the reduction in time achieved by the new function.

POINT TO NOTE: I have kept the precommits off due to errors.

Thankyou.

…orking yet.

…ce_to_anomaly_gdal_ComputeProximity for reduced run time The earlier functions namely distance_to_anomaly and distance_to_anomaly_gdal are slower as compared to the newly added function. The new function 'distance_to_anomaly_gdal_ComputeProximity' is atleast 97% more efficient than the old Distance_to_anomaly function and is about 75% more efficient than the distance_to_anomaly_gdal function. The output was tested on 5 different raster files and there was no difference in the output file of the three function found.

… reduced run time.

nialov · 2024-09-05T13:04:54Z

I will do a more in-depth review later but first please add a test to check the implemented functions! See e.g. tests/raster_processing/test_distance_to_anomaly.py for inspiration on how to implement tests or other files in the tests/ directory. The tests are run by pytest. The purpose of the test(s) is to mainly check that the function works with some pre-defined input and the result is as expected (type is correct, value is sensible, etc.). I see you already added examples of use in the notebook but those are not checked automatically on Github actions so it does not replace actual tests.

nmaarnio · 2024-09-20T09:32:59Z

@nialov , do you have time to review this sometime this month? It'd be better if you can, but I can too if needed.

nialov · 2024-09-20T11:05:20Z

I will try to review next week @nmaarnio though I can not promise it. Might get delayed to 7.10. due to work trip.

Based on a preliminary look, as this uses the GDAL API, I can foresee problems with Windows which will delay merging.

@okolekar The tests (All checks have failed) are not running successfully, see e.g.:

https://github.com/GispoCoding/eis_toolkit/actions/runs/10809648448/job/30251164407?pr=423

You can check the errors and try to see if you can get them fixed. I can help with this during review but as said before that might take time.

okolekar · 2024-09-20T12:00:57Z

Hello to all,
I checked the error message @nialov. It appears to me that the test fails due to the ---> ModuleNotFoundError: No module named '_gdal_array' which is a problem with GDAL. This is due to the GDAL library. The test passes at my end on my computer.
What I can do is :-
I can add the function @pytest.mark.xfail(
sys.platform == "win32", reason="GDAL utilities are not available on Windows.", raises=ModuleNotFoundError)
Which essentially skips the testing.

Regards!
Omkar Kolekar

nialov · 2024-09-23T05:42:32Z

Hello to all, I checked the error message @nialov. It appears to me that the test fails due to the ---> ModuleNotFoundError: No module named '_gdal_array' which is a problem with GDAL. This is due to the GDAL library. The test passes at my end on my computer. What I can do is :- I can add the function @pytest.mark.xfail( sys.platform == "win32", reason="GDAL utilities are not available on Windows.", raises=ModuleNotFoundError) Which essentially skips the testing.

Regards! Omkar Kolekar

Do you use a Windows computer and have you installed eis_toolkit with conda, pip or poetry (or some other method)? If the problem is that Windows gdal library does not have the same bindings as linux, then the optimization is not available for Windows users. The target audience probably mostly uses Windows.

okolekar · 2024-09-24T07:26:29Z

Hello to all, I checked the error message @nialov. It appears to me that the test fails due to the ---> ModuleNotFoundError: No module named '_gdal_array' which is a problem with GDAL. This is due to the GDAL library. The test passes at my end on my computer. What I can do is :- I can add the function @pytest.mark.xfail( sys.platform == "win32", reason="GDAL utilities are not available on Windows.", raises=ModuleNotFoundError) Which essentially skips the testing.
Regards! Omkar Kolekar

Do you use a Windows computer and have you installed eis_toolkit with conda, pip or poetry (or some other method)? If the problem is that Windows gdal library does not have the same bindings as linux, then the optimization is not available for Windows users. The target audience probably mostly uses Windows.

I do use a Windows machine with a conda environment but for development purposes, I have not installed eis_toolkit in this environment. If this still is a problem, I have an idea: we can create a C++ library/function to calculate the distance to anomaly and call this function in python program. The only issue would be to understand how gdal_ComputeProximity works.
Warm Regards,
Omkar

nialov · 2024-09-24T10:44:33Z

I think the primary question here is whythe eis_toolkit environment does not have _gdal_array importable in the Linux (and probably Windows environemnt). What version of gdal do you use? To check version:

➜ gdalinfo --version
GDAL 3.9.2, released 2024/08/13

You should try to get your code to work in the environment defined in environment.yaml or in the eis_toolkit environment installed with conda. See instructions/dev_setup_without_docker_with_conda.md and if it does not work in the environment, try to pinpoint why exactly, i.e., what is the difference between your working environment and environments of eis_toolkit.

nialov · 2024-09-24T11:00:07Z

Also I misspoke earlier about Windows difficulties, I was correlating tests failing with issues on Windows platform but the checks here on GitHub run on linux (Ubuntu) only currently. So I suppose the issue is the Linux environment which might be difficult for you to solve if you on Windows. I can take a look but it will sadly get delayed to 7.10. or onwards

okolekar · 2024-09-24T11:49:06Z

Also I misspoke earlier about Windows difficulties, I was correlating tests failing with issues on Windows platform but the checks here on GitHub run on linux (Ubuntu) only currently. So I suppose the issue is the Linux environment which might be difficult for you to solve if you on Windows. I can take a look but it will sadly get delayed to 7.10. or onwards

I checked the GDAL version I use
gdalinfo --version
GDAL 3.6.2, released 2023/01/02
I do work on Windows OS. I need to check the possibility to work on Linux (Ubuntu).

nialov · 2024-09-24T12:38:21Z

This branch is based on a somewhat old version of eis_toolkit. Can you do a merge and push it. Run the following commands in the repository with your edits but make sure everything is committed first (no unstaged changes):

# Check that upstream points to GispoCoding/eis_toolkit:
git remote -v
# If it does not or it does not exist you can report back here. Otherwise, continue:
git fetch upstream
git merge upstream/master
# origin should point to okolekar/eis_toolkit
git push origin

You should not run into any conflicts but if the merge is not completed successfully and you get a dirty git status, you can do git merge --abort to abort and report back here.

…ote-tracking branch 'upstream/master' into 384-optimize-distance-computation

nialov · 2024-09-24T13:06:51Z

The issue seems to be that gdal is not installed with numpy bindings when installed with poetry, disabling some raster behavior. I believe I have ran into this before. Your tests run successfully on both Linux and Windows when using conda to install the environment.

okolekar · 2024-09-24T13:10:34Z

This branch is based on a somewhat old version of eis_toolkit. Can you do a merge and push it. Run the following commands in the repository with your edits but make sure everything is committed first (no unstaged changes):
# Check that upstream points to GispoCoding/eis_toolkit:
git remote -v
# If it does not or it does not exist you can report back here. Otherwise, continue:
git fetch upstream
git merge upstream/master
# origin should point to okolekar/eis_toolkit
git push origin
You should not run into any conflicts but if the merge is not completed successfully and you get a dirty git status, you can do git merge --abort to abort and report back here.

My merge is completed without any conflicts.

nialov · 2024-09-27T12:16:13Z

I will continue this after 7.10. There might be ways to circumvent the poetry environment issue but I am going to guess that if the code is used with the current imports it cannot be run in the poetry environment but works on conda across platforms (and with pip).

nialov

The code in terms of functionality seems to be working well in tests and in the notebook. My comments mostly relate to style and trying to minimize code duplication, especially in tests. If you have the interest, you could try to refactor some of the code which is identical in functions _distance_to_anomaly_gdal_ComputeProximity and _distance_to_anomaly so that same code is used only once.

nialov · 2024-10-07T11:01:28Z

eis_toolkit/raster_processing/distance_to_anomaly.py

@@ -239,3 +240,113 @@ def _distance_to_anomaly(
    )

    return distance_array
+
+def _distance_to_anomaly_gdal_ComputeProximity(


Please use snakecase for function names in Python:

Suggested change

def _distance_to_anomaly_gdal_ComputeProximity(

def _distance_to_anomaly_gdal_compute_proximity(

After renaming, check the code, notebook and tests that you use the new name.

nialov · 2024-10-07T11:12:03Z

eis_toolkit/raster_processing/distance_to_anomaly.py

+        )
+    #converting True False values to binary formant.
+    converted_values = np.where(data_fits_criteria,1,0)
+    driver = gdal.GetDriverByName("MEM")


gdal does not raise exceptions unless told to do so. I implemented for my own code a function, toggle_gdal_exceptions which is already imported in this file. I suggest to wrap all gdal code with it. E.g.

with toggle_gdal_exceptions(): driver = gdal.GetDriverByName("MEM") temp_raster = driver.Create("", width, height, 1, gdal.GDT_Float32) temp_raster.SetGeoTransform(x_geo + y_geo) band = temp_raster.GetRasterBand(1) band.WriteArray(converted_values) band.SetNoDataValue(nodatavalue) # Create empty proximity raster out_raster = driver.Create("", width, height, 1, gdal.GDT_Float32) out_raster.SetGeoTransform(x_geo + y_geo) out_raster.SetProjection(crs) out_band = out_raster.GetRasterBand(1) out_band.SetNoDataValue(nodatavalue) options = ['values=1','distunits=GEO'] # Compute proximity gdal.ComputeProximity(band, out_band, options) # Create outputs out_array = out_band.ReadAsArray()

Any gdal code run in the block will now raise exceptions.

Alternatively, enable the exceptions as I have done in the function before running your code.

I have enabled the exceptions globally.

nialov · 2024-10-07T11:16:54Z

eis_toolkit/raster_processing/distance_to_anomaly.py

+    This function demonstrates superior performance compared to the distance_to_anomaly 
+    and distance_to_anomaly_gdal functions, as it uses a low-level, C++-based API 
+    within the GDAL library. By directly computing the proximity map from the 
+    source dataset, it benefits from the core-level optimizations inherent to GDAL, 
+    ensuring enhanced efficiency and speed.


Move this below the other paragraph as it is not as relevant to the end-user.

nialov · 2024-10-07T11:33:48Z

notebooks/distance_to_anomaly.ipynb

Please check the notebook for extra code you might have put there during development. See e.g.

import sys sys.path.append("E:/EIS/aus_CMAAS_Projekt") from beak.utilities.raster_processing import calculate_distance_from_raster

in one of the cells.

nialov · 2024-10-07T11:39:23Z

eis_toolkit/vector_processing/distance_computation.py

@@ -90,3 +92,36 @@ def _distance_computation(
    )

    return distance_matrix
+
+
+def _distance_computation_optimized(


This function is currently not being used anywhere.

I did not code the function '_distance_computation_optimized'. However, I have removed the function in the new pull request.

nialov · 2024-10-07T11:46:09Z

tests/raster_processing/test_distance_to_anomaly.py

+            "expected_mean",
+        ]
+    ),
+    [    


As you reuse the same pytest parameters exactly as my test function, I recommend creating a variable with the parameters and using it in both pytest.mark.parametrize inputs:

EXPECTED_PYTEST_PARAMS = [ pytest.param( SMALL_RASTER_PROFILE, SMALL_RASTER_DATA, 5.0, "lower", EXPECTED_SMALL_RASTER_SHAPE, 5.694903, id="small_raster_lower", ), pytest.param( SMALL_RASTER_PROFILE, SMALL_RASTER_DATA, 5.0, "higher", EXPECTED_SMALL_RASTER_SHAPE, 6.451948, id="small_raster_higher", ), ... ]

Usage:

@pytest.mark.parametrize( ",".join( [ "anomaly_raster_profile", "anomaly_raster_data", "threshold_criteria_value", "threshold_criteria", "expected_shape", "expected_mean", ] ), EXPECTED_PYTEST_PARAMS, ) def test_distance_to_anomaly_expected(

@pytest.mark.parametrize( ",".join( [ "anomaly_raster_profile", "anomaly_raster_data", "threshold_criteria_value", "threshold_criteria", "expected_shape", "expected_mean", ] ), EXPECTED_PYTEST_PARAMS, ) def test_distance_to_anomaly_gdal_ComputeProximity_expected(

Also check if you use same parameters elsewhere and do the same there! (test_distance_to_anomaly_gdal_ComputeProximity_expected_check)

nialov · 2024-10-07T11:58:22Z

tests/raster_processing/test_distance_to_anomaly.py

+        ),
+    ],
+)
+def test_distance_to_anomaly_gdal_ComputeProximity_expected_check(


I would critically review, if checking most of the same exceptions as test_distance_to_anomaly_check is required here. At the very least, you need to create a variable with the same parameters as instructed in my above review comment.

okolekar · 2024-10-16T08:40:02Z

The code in terms of functionality seems to be working well in tests and in the notebook. My comments mostly relate to style and trying to minimize code duplication, especially in tests. If you have the interest, you could try to refactor some of the code which is identical in functions _distance_to_anomaly_gdal_ComputeProximity and _distance_to_anomaly so that same code is used only once.

To avoid repetition of code, I have created a function named '_validate_threshold_criteria' that contains the common code from both the functions ('_distance_to_anomaly_gdal_ComputeProximity' and '_distance_to_anomaly') and both the functions call '_validate_threshold_criteria' internally.

1) renamed _distance_to_anomaly_gdal_ComputeProximity to _distance_to_anomaly_gdal_compute_proximity 2) enabled toggle_gdal_exceptions globaly in the distance_to_anomaly.py 3) Removed unused code form python notebook 4) Added all the parameters in a single variable in test_distance_to_anomaly_gdal_compute_proximity_expeccted_check

nialov · 2024-10-21T11:13:41Z

I suppose this is ready for review again? I will take a look in a week (or two).

okolekar · 2024-10-21T11:15:28Z

I suppose this is ready for review again? I will take a look in a week (or two).

Yes, it is ready for review again. Thank you for the support.
I have also taken care of precommits this time, as I have solved the issues with the precommits.

nmaarnio · 2024-10-21T11:26:19Z

This PR needs to be merged this week if we want to include it in the released planned for Friday this week. If Nikolas is busy, I might jump in and do a review tomorrow so that testers will be able to run the new version before progress meet.

okolekar · 2024-10-21T11:37:49Z

This PR needs to be merged this week if we want to include it in the released planned for Friday this week. If Nikolas is busy, I might jump in and do a review tomorrow so that testers will be able to run the new version before progress meet.

Sound okay. If there are any changes needed, I am available to modify them as soon as possible to ensure timely delivery of the code. Also, a point to note here is that the changes were mainly cosmetic.

nmaarnio · 2024-10-22T08:47:45Z

Hi @okolekar . I only now realized you have been optimizing distance_to_anomaly here, notdistance_computation. The issue and PR name refer to distance_computation which is the vector processing tool that the original implementation of distance_computation was using under the hood. Did you use any time to think about optimizing distance_computation, or only distance_to_anomaly so far?

Either way, it's of course good that you have been optimizing this tool since it's used a lot. However, we direly need also distance_computation optimized since it's a commonly used tool in MPM. Optimization of distance_computation would also optimize vector_density.

When it comes to actual review of this PR, you should mark the failing test functions to fail if platform is not Windows, so like this:
@pytest.mark.xfail( sys.platform != "win32", reason="gdal_array available only on Windows.", raises=ModuleNotFoundError )
Also, I think we could comment out the old distance_to_anomaly_gdal tool if it is not used, the module has now a lot of tools that might confuse users.

okolekar · 2024-10-22T09:18:28Z

Hi @okolekar . I only now realized you have been optimizing distance_to_anomaly here, notdistance_computation. The issue and PR name refer to distance_computation which is the vector processing tool that the original implementation of distance_computation was using under the hood. Did you use any time to think about optimizing distance_computation, or only distance_to_anomaly so far?

Either way, it's of course good that you have been optimizing this tool since it's used a lot. However, we direly need also distance_computation optimized since it's a commonly used tool in MPM. Optimization of distance_computation would also optimize vector_density.

When it comes to actual review of this PR, you should mark the failing test functions to fail if platform is not Windows, so like this: @pytest.mark.xfail( sys.platform != "win32", reason="gdal_array available only on Windows.", raises=ModuleNotFoundError ) Also, I think we could comment out the old distance_to_anomaly_gdal tool if it is not used, the module has now a lot of tools that might confuse users.

Hi @nmaarnio,
I was asked to optimize the distance_to_anomaly only.
However, I can start with that tool as well. But it will take some time.

Regarding the PR, will upload the suggested changes in an hour or two.
Thanks!

nmaarnio · 2024-10-22T09:23:17Z

Okay. I think there has been some miscommunication at some point. Anyway, I will rename this PR and relink the issue as this does not concern distance_computation

Commented the old distance_to_anomaly_gdal as there were too many functions for the user. Added an exception in the tests for gdal_array as the gdal_array is available only on the Windows system. Further more updates in the notebook are made as the old distance_to_anomaly_gdal is commented and hence is removed here.

okolekar · 2024-10-22T11:13:34Z

@nmaarnio I have made the requested changes. Now I will shift my focus to optimize the Distance_computation in the vector processing tool.

nmaarnio

After you do the small commenting out change then I'll merge :) thanks for your work with this PR

EDIT: I'll actually do this myself since it's such a small cosmetic thing and then merge the PR 👍🏻

eis_toolkit/raster_processing/distance_to_anomaly.py

nmaarnio · 2024-10-23T07:42:50Z

And sorry @nialov for bypassing you in the review process. There is just some pressure to get this, among some other features/improvements, merged before the next progress meeting. You are of course free to review the implementation when you have time and make improvement suggestions even after this version has been merged.

nialov · 2024-10-24T06:36:52Z

Hey, no problem @nmaarnio . The pull request was quite fine already with tests confirming the functionality so second review would not have been too important.

Also to note that the gdal_array error does not work due to poetry, independent of platform. So it would work if the test installed with pip on linux. Of course poetry is only used on Linux so disabling it on that platform works

nmaarnio and others added 3 commits May 2, 2024 11:56

Distance computation optimization WIP. The optimized version is not w…

b6d58c5

…orking yet.

added function distance_to_anomaly_gdal_ComputeProximity function for…

c422e64

… reduced run time.

okolekar linked an issue Sep 5, 2024 that may be closed by this pull request

Optimize Distance computation #384

Open

okolekar marked this pull request as draft September 5, 2024 07:03

okolekar marked this pull request as ready for review September 5, 2024 07:07

Added tests to check the new implemented functions

003d213

"Added test to test the new gdal ComputeProximity function "Merge rem…

6c8596d

…ote-tracking branch 'upstream/master' into 384-optimize-distance-computation

nialov requested changes Oct 7, 2024

View reviewed changes

nmaarnio removed a link to an issue Oct 22, 2024

Optimize Distance computation #384

Open

nmaarnio linked an issue Oct 22, 2024 that may be closed by this pull request

Optimize Distance to anomaly #409

Closed

nmaarnio changed the title ~~384 optimize distance computation~~ 384 optimize distance to anomaly Oct 22, 2024

nmaarnio reviewed Oct 23, 2024

View reviewed changes

eis_toolkit/raster_processing/distance_to_anomaly.py Show resolved Hide resolved

nmaarnio merged commit c66cd68 into GispoCoding:master Oct 23, 2024
4 checks passed

	def _distance_to_anomaly_gdal_ComputeProximity(
	def _distance_to_anomaly_gdal_compute_proximity(

384 optimize distance to anomaly #423

384 optimize distance to anomaly #423

Conversation

okolekar commented Sep 5, 2024

nialov commented Sep 5, 2024

nmaarnio commented Sep 20, 2024

nialov commented Sep 20, 2024

okolekar commented Sep 20, 2024 • edited Loading

nialov commented Sep 23, 2024

okolekar commented Sep 24, 2024

nialov commented Sep 24, 2024

nialov commented Sep 24, 2024

okolekar commented Sep 24, 2024

nialov commented Sep 24, 2024

nialov commented Sep 24, 2024

okolekar commented Sep 24, 2024

nialov commented Sep 27, 2024

nialov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nialov Oct 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

okolekar commented Oct 16, 2024

nialov commented Oct 21, 2024

okolekar commented Oct 21, 2024 • edited Loading

nmaarnio commented Oct 21, 2024

okolekar commented Oct 21, 2024 • edited Loading

nmaarnio commented Oct 22, 2024

okolekar commented Oct 22, 2024

nmaarnio commented Oct 22, 2024

okolekar commented Oct 22, 2024

nmaarnio left a comment • edited Loading

Choose a reason for hiding this comment

nmaarnio commented Oct 23, 2024

nialov commented Oct 24, 2024

okolekar commented Sep 20, 2024 •

edited

Loading

nialov Oct 7, 2024 •

edited

Loading

okolekar commented Oct 21, 2024 •

edited

Loading

okolekar commented Oct 21, 2024 •

edited

Loading

nmaarnio left a comment •

edited

Loading