Fix lint errors in datacube script #36

weiji14 · 2023-11-15T20:45:24Z

Fixing some pre-commit lint errors from #27, mostly by wrapping line-lengths of docstrings to under 88 characters.

Wrapping docstrings in scripts/datacube.py to under 88 characters.

Fixes F841 Local variable `best_nodata` is assigned to but never used. Only the best_clouds variable was used, and best_nodata was omitted, but both should be used. Doing this in a single pandas sort_values function.

weiji14 · 2023-11-15T21:19:21Z

scripts/datacube.py

-    best_nodata = (
-        s2_items_gdf[["s2:nodata_pixel_percentage"]]
-        .groupby(["s2:nodata_pixel_percentage"])
-        .sum()
-        .sort_values(by="s2:nodata_pixel_percentage", ascending=True)
-        .index[0]
-    )
-
-    best_clouds = (
-        s2_items_gdf[["eo:cloud_cover"]]
-        .groupby(["eo:cloud_cover"])
-        .sum()
-        .sort_values(by="eo:cloud_cover", ascending=True)
-        .index[0]
-    )
+    least_nodata_and_clouds = s2_items_gdf.sort_values(
+        by=["s2:nodata_pixel_percentage", "eo:cloud_cover"], ascending=True
+    ).index[0]


@lillythomas, the linter was picking up that the best_nodata variable wasn't used. I've refactored this to sort based on lowest s2:nodata_pixel_percentage first, and then lowest eo:cloud_cover, so the resulting dataframe (searching between dates 2020-01-13/2020-01-19) would look something like this:

datetime s2:nodata_pixel_percentage eo:cloud_cover 6 2020-01-17T18:47:09.024000Z 0.000000 33.132727 5 2020-01-22T18:46:51.024000Z 0.000017 40.740009 1 2020-02-11T18:45:11.024000Z 0.000043 2.595613 0 2020-02-16T18:44:39.024000Z 0.000056 10.043734 4 2020-01-27T18:46:29.024000Z 0.000066 12.765600 2 2020-02-06T18:45:39.024000Z 0.000136 2.524842 3 2020-02-01T18:46:11.024000Z 0.000169 1.933268

And we would pick the first row here (datetime: 2020-01-17T18:47:09.024000Z). Does this look ok? Or should we change the logic a bit (e.g. as long as the nodata_pixel_percentage is <20%, we just pick the lowest cloud cover, which would then be datetime: 2020-02-01T18:46:11.024000Z)

Yes, I knew that best_nodata wasn't being used because I was mainly thinking that optimizing for least cloud cover is our best bet given that there is a pre-existing filter for nodata by way of {"op": "<=", "args": [{"property": "s2:nodata_pixel_percentage"}, nodata_pixel_percentage]} in the search query. I think this is a good alternative, but in some/many cases it won't optimize for cloud cover (as this example shows where the first row has a nontrivial amount of cloudy pixels - 33 %).

Ok, so we would just pick the least cloudy image then, as long as the nodata_pixel_percentage is below the 20% threshold? I.e. simply remove the best_nodata variable and stick to having just best_clouds?

Yes I think the best call is for us to just remove the best_nodata variable 👍

Done at f65e34a!

Patch cc99ae4

Using ds_ prefix for xr.Dataset objects, and da_ prefix for xr.DataArray objects.

Increase from default value of 5 to 6,

weiji14 · 2023-11-15T22:00:13Z

pyproject.toml

@@ -13,3 +13,6 @@ select = [
    "UP",   # pyupgrade
    "W",    # pycodestyle warnings
 ]
+
+[tool.ruff.lint.pylint]
+max-args = 6


Increasing the maximum allowed arguments to functions from the default of 5 to 6, to avoid these errors:

scripts/datacube.py:293:5: PLR0913 Too many arguments in function definition (6 > 5) scripts/datacube.py:428:5: PLR0913 Too many arguments in function definition (6 > 5)

Alternatively, we could look at simplifying those functions to have less arguments.

Fixes `FutureWarning: get_all_items() is deprecated, use item_collection() instead`.

No need to sort by `s2:nodata_pixel_percentage` anymore, just get the Sentinel-2 STAC item with the least cloud cover.

Missed a few more da_ to ds_ renames, following from 2af24be

yellowcap

Looks good overall, if I can be extra picky, could we also use lowercase for some variables that are currently upper case, but are not constants?

Examples:

model/scripts/datacube.py

Line 409 in 3608d6d

    
           date, YEAR, MONTH, DAY, CLOUD = get_conditions(year1, year2, cloud_cover_percentage)

,

model/scripts/datacube.py

Line 239 in 3608d6d

def search_dem(BBOX, catalog, epsg):

yellowcap

Leave the remaining improvements for later. I will merge so I can work on this file.

🚨 Fix E501 Line too long

7fd0939

Wrapping docstrings in scripts/datacube.py to under 88 characters.

weiji14 added the documentation Improvements or additions to documentation label Nov 15, 2023

weiji14 self-assigned this Nov 15, 2023

weiji14 changed the title ~~🚨 Fix E501 Line too long~~ Fix lint errors in datacube script Nov 15, 2023

♻️ Refactor best_nodata and best_clouds into single sort function

cc99ae4

Fixes F841 Local variable `best_nodata` is assigned to but never used. Only the best_clouds variable was used, and best_nodata was omitted, but both should be used. Doing this in a single pandas sort_values function.

weiji14 commented Nov 15, 2023

View reviewed changes

weiji14 added 3 commits November 16, 2023 10:36

🚑 Quickfix with getting the STAC item with a specific datetime

262aaa6

Patch cc99ae4

🏷️ Rename variables to ds_ (xr.Dataset) or da (xr.DataArray)

2af24be

Using ds_ prefix for xr.Dataset objects, and da_ prefix for xr.DataArray objects.

🔧 Set pylint max-args to 6

4664b59

Increase from default value of 5 to 6,

weiji14 commented Nov 15, 2023

View reviewed changes

weiji14 marked this pull request as ready for review November 15, 2023 22:00

weiji14 added 3 commits November 16, 2023 11:10

🗑️ Replace .get_all_items() with .item_collection()

a396c5d

Fixes `FutureWarning: get_all_items() is deprecated, use item_collection() instead`.

🔥 Remove sorting by nodata and just sort by least cloud cover

f65e34a

No need to sort by `s2:nodata_pixel_percentage` anymore, just get the Sentinel-2 STAC item with the least cloud cover.

📝 More DataArray to Dataset renames

7b20878

Missed a few more da_ to ds_ renames, following from 2af24be

yellowcap reviewed Nov 16, 2023

View reviewed changes

yellowcap approved these changes Nov 16, 2023

View reviewed changes

yellowcap merged commit 120f8b1 into main Nov 16, 2023
1 check passed

yellowcap deleted the lint/datacube branch November 16, 2023 15:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix lint errors in datacube script #36

Fix lint errors in datacube script #36

weiji14 commented Nov 15, 2023

weiji14 Nov 15, 2023

lillythomas Nov 16, 2023

weiji14 Nov 16, 2023

lillythomas Nov 16, 2023

weiji14 Nov 16, 2023

weiji14 Nov 15, 2023

yellowcap left a comment •

edited by weiji14

Loading

yellowcap left a comment

Fix lint errors in datacube script #36

Fix lint errors in datacube script #36

Conversation

weiji14 commented Nov 15, 2023

weiji14 Nov 15, 2023

Choose a reason for hiding this comment

lillythomas Nov 16, 2023

Choose a reason for hiding this comment

weiji14 Nov 16, 2023

Choose a reason for hiding this comment

lillythomas Nov 16, 2023

Choose a reason for hiding this comment

weiji14 Nov 16, 2023

Choose a reason for hiding this comment

weiji14 Nov 15, 2023

Choose a reason for hiding this comment

yellowcap left a comment • edited by weiji14 Loading

Choose a reason for hiding this comment

yellowcap left a comment

Choose a reason for hiding this comment

yellowcap left a comment •

edited by weiji14

Loading