EuroCrops: handle Nones in get_label #2499

burakekim · 2025-01-03T19:08:45Z

Now we return 0 if hcat_code is None, not rendering the feature.

adamjstewart · 2025-01-03T22:40:30Z

Nice catch!

It looks like HCATv2 actually has a class for this:

"not_known_and_other,3399000000"

I wonder if we should use this instead:

hcat_code = feature['properties'][self.label_name] or '3399000000'

I'm not sure if 0 is ever actually being used or not. Are there any other instances where we return 0?

@favyen2 do you remember why you added 0? Just to appease mypy?

burakekim · 2025-01-04T10:17:46Z

It looks like HCATv2 actually has a class for this:

Under Harmonisation with HCAT, it seems EC_hcat_c is already the label itself, and that is the variable that is assigned as None. If I understand correctly, we should discard EC_hcat_c with None since they are not assigned any value, including not_known_and_other. I am unsure whether they serve any purpose in the creation of the dataset though

I am in favor of removing the print statement that notifies the user each time we hit None, as it becomes very verbose during training.

favyen2 · 2025-01-11T16:48:34Z

do you remember why you added 0? Just to appease mypy?

Do you mean for the HCAT code or for the label returned by the get_label function?

I don't think I ran into cases where HCAT code was None or 0 so it may be in a new version of the data or in a geographic area that I did not test.

For returning 0 from get_label, the function needs to return some integer for cases where the HCAT code is not in the list of codes that the user provides (i.e. the list they care about). Currently it returns the background class 0, which is also used for pixels that are not covered by any polygon. There may be cases where the user wants to use a different code for background class but for this use case I think it would work for them to add a catch-all class to the list of classes passed to the constructor since it is hierarchical (e.g. classes=["1234", "5678", "0000"], now there is a separate "other" category that will be labeled as 3 after rasterization distinct from the background class 0).

adamjstewart · 2025-01-11T18:11:40Z

Should we use 3399000000 instead of 0? I can also ask the EuroCrops folks if necessary.

favyen2 · 2025-01-11T19:26:04Z

As far as I know, get_label converts the HCAT code into a class integer suitable for training the segmentation model. So after the conversion these returned class indices should correspond to output neurons from the neural model rather than the original codes.

favyen2 · 2025-01-11T19:27:45Z

Oh, I guess you are saying that some users might still find it useful to train on the feature, mapping it to an "other" category (which first setting the code to 3399000000 and then mapping it based on the user-provided category list would enable) rather than being limited to mapping it to 0 which matches the background category.

I'm not sure the answer to this, it may depend on why there are these weird features in the dataset that don't have a code. I guess the safest would be to mark them invalid but otherwise without digging more into the data I think both are reasonable solutions.

adamjstewart · 2025-01-12T11:21:03Z

Oh, I guess you are saying that some users might still find it useful to train on the feature, mapping it to an "other" category (which first setting the code to 3399000000 and then mapping it based on the user-provided category list would enable) rather than being limited to mapping it to 0 which matches the background category.

I'm actually saying the exact opposite. I'm wondering if the background class, None, and "not_known_and_other" should all be mapped to the same value. Otherwise, users will have to add all three to ignore_index during training.

burakekim · 2025-01-12T12:10:43Z

I think it would be worth reaching out to EuroCrops people to ask whether None and not_known_and_other serve any specific purpose across the entire dataset. It is best to find out if there is any reason behind this design choice

favyen2 · 2025-01-12T23:30:30Z

I'm actually saying the exact opposite. I'm wondering if the background class, None, and "not_known_and_other" should all be mapped to the same value. Otherwise, users will have to add all three to ignore_index during training.

I think there would be cases where users want to ignore None / not_known_and_other but not background, to try to have the model distinguish areas that are not crop fields.

adamjstewart · 2025-01-13T11:31:17Z

Reply from the EuroCrops folks:

Depending on the version of EuroCrops you downloaded some of the None fields either represent duplicated field parcels left blank intentionally or nothing was provided by the source data to begin with. In the latter case, background class would be the way to go. For the former, it depends on what you want to achieve. These are duplicate field parcels with either the same or different crop labels, since most people only wanted one label per parcel, we chose to only map the first appearing parcel/largest area. Depending on your needs you could also do your own processing since the original label is still in the file.
Regarding not_known_and_other these are predominately either artificial built-up areas or water bodies. Most crops that are non-specific/not-known are in other arable crops or the like. Hope this helps!
We are also very close to releasing a new HCAT so maybe everything would be clearer then?

burakekim · 2025-01-13T19:32:08Z

some of the None fields either represent duplicated field parcels

These are duplicate field parcels with either the same or different crop labels, since most people only wanted one label per parcel

Does this mean some fields have multiple labels, and only one is used while the rest are mapped to None? If so, it is better not to map None to any other label.

Regarding not_known_and_other these are predominately either artificial built-up areas or water bodies.

This feels more like background to me.

handle Nones in get_label

068c731

github-actions bot added the datasets Geospatial or benchmark datasets label Jan 3, 2025

burakekim added 2 commits January 3, 2025 20:50

make ruff hapy

1b18387

unit test for the win

3590666

github-actions bot added the testing Continuous integration testing label Jan 3, 2025

adamjstewart added this to the 0.6.3 milestone Jan 3, 2025

burakekim and others added 2 commits January 10, 2025 00:00

remove print statement

4593510

Merge branch 'main' into eurocrops_handlenones

863aaa9

adamjstewart changed the title ~~handle Nones in get_label~~ EuroCrops: handle Nones in get_label Jan 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EuroCrops: handle Nones in get_label #2499

EuroCrops: handle Nones in get_label #2499

burakekim commented Jan 3, 2025

adamjstewart commented Jan 3, 2025 •

edited

Loading

burakekim commented Jan 4, 2025 •

edited

Loading

favyen2 commented Jan 11, 2025

adamjstewart commented Jan 11, 2025

favyen2 commented Jan 11, 2025

favyen2 commented Jan 11, 2025 •

edited

Loading

adamjstewart commented Jan 12, 2025

burakekim commented Jan 12, 2025

favyen2 commented Jan 12, 2025

adamjstewart commented Jan 13, 2025

burakekim commented Jan 13, 2025

EuroCrops: handle Nones in get_label #2499

Are you sure you want to change the base?

EuroCrops: handle Nones in get_label #2499

Conversation

burakekim commented Jan 3, 2025

adamjstewart commented Jan 3, 2025 • edited Loading

burakekim commented Jan 4, 2025 • edited Loading

favyen2 commented Jan 11, 2025

adamjstewart commented Jan 11, 2025

favyen2 commented Jan 11, 2025

favyen2 commented Jan 11, 2025 • edited Loading

adamjstewart commented Jan 12, 2025

burakekim commented Jan 12, 2025

favyen2 commented Jan 12, 2025

adamjstewart commented Jan 13, 2025

burakekim commented Jan 13, 2025

adamjstewart commented Jan 3, 2025 •

edited

Loading

burakekim commented Jan 4, 2025 •

edited

Loading

favyen2 commented Jan 11, 2025 •

edited

Loading