-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
384 optimize distance to anomaly #423
384 optimize distance to anomaly #423
Conversation
…ce_to_anomaly_gdal_ComputeProximity for reduced run time The earlier functions namely distance_to_anomaly and distance_to_anomaly_gdal are slower as compared to the newly added function. The new function 'distance_to_anomaly_gdal_ComputeProximity' is atleast 97% more efficient than the old Distance_to_anomaly function and is about 75% more efficient than the distance_to_anomaly_gdal function. The output was tested on 5 different raster files and there was no difference in the output file of the three function found.
… reduced run time.
I will do a more in-depth review later but first please add a test to check the implemented functions! See e.g. |
@nialov , do you have time to review this sometime this month? It'd be better if you can, but I can too if needed. |
I will try to review next week @nmaarnio though I can not promise it. Might get delayed to 7.10. due to work trip. Based on a preliminary look, as this uses the @okolekar The tests (All checks have failed) are not running successfully, see e.g.: https://github.com/GispoCoding/eis_toolkit/actions/runs/10809648448/job/30251164407?pr=423 You can check the errors and try to see if you can get them fixed. I can help with this during review but as said before that might take time. |
Hello to all, Regards! |
Do you use a Windows computer and have you installed |
I do use a Windows machine with a conda environment but for development purposes, I have not installed eis_toolkit in this environment. If this still is a problem, I have an idea: we can create a C++ library/function to calculate the distance to anomaly and call this function in python program. The only issue would be to understand how gdal_ComputeProximity works. |
I think the primary question here is whythe
You should try to get your code to work in the environment defined in |
Also I misspoke earlier about Windows difficulties, I was correlating tests failing with issues on Windows platform but the checks here on GitHub run on linux (Ubuntu) only currently. So I suppose the issue is the Linux environment which might be difficult for you to solve if you on Windows. I can take a look but it will sadly get delayed to 7.10. or onwards |
I checked the GDAL version I use |
This branch is based on a somewhat old version of # Check that upstream points to GispoCoding/eis_toolkit:
git remote -v
# If it does not or it does not exist you can report back here. Otherwise, continue:
git fetch upstream
git merge upstream/master
# origin should point to okolekar/eis_toolkit
git push origin You should not run into any conflicts but if the merge is not completed successfully and you get a dirty |
…ote-tracking branch 'upstream/master' into 384-optimize-distance-computation
The issue seems to be that |
My merge is completed without any conflicts. |
I will continue this after 7.10. There might be ways to circumvent the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code in terms of functionality seems to be working well in tests and in the notebook. My comments mostly relate to style and trying to minimize code duplication, especially in tests. If you have the interest, you could try to refactor some of the code which is identical in functions _distance_to_anomaly_gdal_ComputeProximity
and _distance_to_anomaly
so that same code is used only once.
@@ -239,3 +240,113 @@ def _distance_to_anomaly( | |||
) | |||
|
|||
return distance_array | |||
|
|||
def _distance_to_anomaly_gdal_ComputeProximity( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use snakecase for function names in Python:
def _distance_to_anomaly_gdal_ComputeProximity( | |
def _distance_to_anomaly_gdal_compute_proximity( |
After renaming, check the code, notebook and tests that you use the new name.
) | ||
#converting True False values to binary formant. | ||
converted_values = np.where(data_fits_criteria,1,0) | ||
driver = gdal.GetDriverByName("MEM") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gdal
does not raise exceptions unless told to do so. I implemented for my own code a function, toggle_gdal_exceptions
which is already imported in this file. I suggest to wrap all gdal
code with it. E.g.
with toggle_gdal_exceptions():
driver = gdal.GetDriverByName("MEM")
temp_raster = driver.Create("", width, height, 1, gdal.GDT_Float32)
temp_raster.SetGeoTransform(x_geo + y_geo)
band = temp_raster.GetRasterBand(1)
band.WriteArray(converted_values)
band.SetNoDataValue(nodatavalue)
# Create empty proximity raster
out_raster = driver.Create("", width, height, 1, gdal.GDT_Float32)
out_raster.SetGeoTransform(x_geo + y_geo)
out_raster.SetProjection(crs)
out_band = out_raster.GetRasterBand(1)
out_band.SetNoDataValue(nodatavalue)
options = ['values=1','distunits=GEO']
# Compute proximity
gdal.ComputeProximity(band, out_band, options)
# Create outputs
out_array = out_band.ReadAsArray()
Any gdal
code run in the block will now raise exceptions.
Alternatively, enable the exceptions as I have done in the function before running your code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have enabled the exceptions globally.
This function demonstrates superior performance compared to the distance_to_anomaly | ||
and distance_to_anomaly_gdal functions, as it uses a low-level, C++-based API | ||
within the GDAL library. By directly computing the proximity map from the | ||
source dataset, it benefits from the core-level optimizations inherent to GDAL, | ||
ensuring enhanced efficiency and speed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this below the other paragraph as it is not as relevant to the end-user.
notebooks/distance_to_anomaly.ipynb
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check the notebook for extra code you might have put there during development. See e.g.
import sys
sys.path.append("E:/EIS/aus_CMAAS_Projekt")
from beak.utilities.raster_processing import calculate_distance_from_raster
in one of the cells.
@@ -90,3 +92,36 @@ def _distance_computation( | |||
) | |||
|
|||
return distance_matrix | |||
|
|||
|
|||
def _distance_computation_optimized( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is currently not being used anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not code the function '_distance_computation_optimized'. However, I have removed the function in the new pull request.
"expected_mean", | ||
] | ||
), | ||
[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you reuse the same pytest
parameters exactly as my test function, I recommend creating a variable with the parameters and using it in both pytest.mark.parametrize
inputs:
EXPECTED_PYTEST_PARAMS = [
pytest.param(
SMALL_RASTER_PROFILE,
SMALL_RASTER_DATA,
5.0,
"lower",
EXPECTED_SMALL_RASTER_SHAPE,
5.694903,
id="small_raster_lower",
),
pytest.param(
SMALL_RASTER_PROFILE,
SMALL_RASTER_DATA,
5.0,
"higher",
EXPECTED_SMALL_RASTER_SHAPE,
6.451948,
id="small_raster_higher",
),
...
]
Usage:
@pytest.mark.parametrize(
",".join(
[
"anomaly_raster_profile",
"anomaly_raster_data",
"threshold_criteria_value",
"threshold_criteria",
"expected_shape",
"expected_mean",
]
),
EXPECTED_PYTEST_PARAMS,
)
def test_distance_to_anomaly_expected(
@pytest.mark.parametrize(
",".join(
[
"anomaly_raster_profile",
"anomaly_raster_data",
"threshold_criteria_value",
"threshold_criteria",
"expected_shape",
"expected_mean",
]
),
EXPECTED_PYTEST_PARAMS,
)
def test_distance_to_anomaly_gdal_ComputeProximity_expected(
Also check if you use same parameters elsewhere and do the same there! (test_distance_to_anomaly_gdal_ComputeProximity_expected_check
)
), | ||
], | ||
) | ||
def test_distance_to_anomaly_gdal_ComputeProximity_expected_check( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would critically review, if checking most of the same exceptions as test_distance_to_anomaly_check
is required here. At the very least, you need to create a variable with the same parameters as instructed in my above review comment.
To avoid repetition of code, I have created a function named '_validate_threshold_criteria' that contains the common code from both the functions ('_distance_to_anomaly_gdal_ComputeProximity' and '_distance_to_anomaly') and both the functions call '_validate_threshold_criteria' internally. |
1) renamed _distance_to_anomaly_gdal_ComputeProximity to _distance_to_anomaly_gdal_compute_proximity 2) enabled toggle_gdal_exceptions globaly in the distance_to_anomaly.py 3) Removed unused code form python notebook 4) Added all the parameters in a single variable in test_distance_to_anomaly_gdal_compute_proximity_expeccted_check
I suppose this is ready for review again? I will take a look in a week (or two). |
Yes, it is ready for review again. Thank you for the support. |
This PR needs to be merged this week if we want to include it in the released planned for Friday this week. If Nikolas is busy, I might jump in and do a review tomorrow so that testers will be able to run the new version before progress meet. |
Sound okay. If there are any changes needed, I am available to modify them as soon as possible to ensure timely delivery of the code. Also, a point to note here is that the changes were mainly cosmetic. |
Hi @okolekar . I only now realized you have been optimizing Either way, it's of course good that you have been optimizing this tool since it's used a lot. However, we direly need also When it comes to actual review of this PR, you should mark the failing test functions to fail if platform is not Windows, so like this: |
Hi @nmaarnio, Regarding the PR, will upload the suggested changes in an hour or two. |
Okay. I think there has been some miscommunication at some point. Anyway, I will rename this PR and relink the issue as this does not concern |
Commented the old distance_to_anomaly_gdal as there were too many functions for the user. Added an exception in the tests for gdal_array as the gdal_array is available only on the Windows system. Further more updates in the notebook are made as the old distance_to_anomaly_gdal is commented and hence is removed here.
@nmaarnio I have made the requested changes. Now I will shift my focus to optimize the Distance_computation in the vector processing tool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After you do the small commenting out change then I'll merge :) thanks for your work with this PR
EDIT: I'll actually do this myself since it's such a small cosmetic thing and then merge the PR 👍🏻
And sorry @nialov for bypassing you in the review process. There is just some pressure to get this, among some other features/improvements, merged before the next progress meeting. You are of course free to review the implementation when you have time and make improvement suggestions even after this version has been merged. |
Hey, no problem @nmaarnio . The pull request was quite fine already with tests confirming the functionality so second review would not have been too important. Also to note that the |
Hello,
I have added the function "distance_to_anomaly_gdal_ComputeProximity" to reduce the runtime for distance computation. The new function "distance_to_anomaly_gdal_ComputeProximity" is a C++ based API function and hence is faster as compared to "distance_to_anomaly" (which was using the point based scheme) and "distance_to_anomaly_gdal" (which was using the python API for distance computation) function. All the three produce the exact same result which is verified in the python notebook distance_to_anomaly.ipynb.
Below I have attached some graphs inorder showcase the reduction in time achieved by the new function.
POINT TO NOTE: I have kept the precommits off due to errors.
Thankyou.