This model classifies crop types for each field based on the field as well as on its surroundings.
In the Zindi AgriFieldNet India Challenge
this was the third place solution by the team re-union
in the final round to classify crop
types in agricultural fields across Northern India using multispectral
observations from Sentinel-2 satellite.
MLHub model id: model_ecaas_agrifieldnet_bronze_v1
. Browse on Radiant MLHub.
AgriFieldNet Competition Dataset
Ferreira, M.G. and Dung, T. "Looking further: a crop type classification model for fields", Version 1.0, Radiant MLHub. [Date Accessed] Radiant MLHub https://doi.org/10.34911/rdnt.iaki1s
- MG Ferreira - Ferra Solutions www.ferrasolutions.com LinkedIn
- Tien Dung
Use LinkedIn to message.
The applicable spatial extent, for new inferencing.
{
"type": "FeatureCollection",
"features": [
{
"properties": {
"id": "ref_agrifieldnet_competition_v1"
},
"type": "Feature",
"geometry": {
"type": "MultiPolygon",
"bbox": [
76.2448,
18.9414,
88.046,
28.327
],
"coordinates": [
[
[
[
88.046,
18.9414
],
[
88.046,
28.327
],
[
76.2448,
28.327
],
[
76.2448,
18.9414
],
[
88.046,
18.9414
]
]
]
]
}
}
]
}
The recommended start/end date of imagery for new inferencing.
Start | End |
---|---|
2022-01-01 | present |
- Supervised
- Classification
This will calculate descriptive statistics for each field as well as for a number of concentric borders around it. It will also classify the terrain type around the field. All of these features are then clustered, up-sampled and finally fed into a random forest for classification.
- MacOS (darwin)
- CPU
Review the GitHub repository README to get started running this model for new inferencing.
A year ago our model got top score in the similar "Spot the Crop" competition. There you had to classify field crop types based on Sentinel S1 and S2 and also based on just S2 imagery (thus two streams).
That time, we used a gradient booster and just a few derivative bands and it was enough.
For this competition, we started there. However, it soon became apparent that this was a lot more difficult. The imagery was a lot more homogeneous and it became really difficult to discriminate between crop types. The previous approach simply did not work well this time.
Even in the previous challenge, we felt that we were not making use of the surroundings enough. If a river or mountain lay next to a field, surely it would influence the crop type! So this time we tried to also use the "dark" pixels - those pixels that are outside of any field's boundary.
We did this by creating a thick border around each field, and then a border around that border and so on. Each of these concentric borders were then evaluated and added to the list of features for each field.
We also used a pre-trained model to classify the terrain to the north, east, south and west of each field. We also applied the classification to the field itself to try and squeeze a bit extra out of the information.
On the very last day of the competition, we also added the distribution of crop types of fields in the vicinity of the field being classified. This is not a new idea, we used it in the "Spot the Crop" challenge, and here it did add a bit to our score.
One of the best improvements we got in our score was when we started to cluster the data and then to pass the clusters on as features to the model. This made a huge difference to our position on the leaderboard and was one of our big breakthroughs for this competition. I think the reason for this is that we used a lot of derived bands and our features were very correlated, so that the clustering helped reduce that and helped the model fit better.
The dataset is highly imbalanced and we got a bit of an improvement in score when we oversampled the data to get better class representation.
This time around, a gradient booster did not work as well, and we ended up using a random forest. The reason for this also, I think, is the correlation between the features given to the model.
The model itself is a random forest. A gradient booster does not work as well because the features are highly correlated.
The data is processed first. For each band and each field, a number of descriptive statistics such as mean, median, standard deviation and so on are calculated. Then a number of concentric borders are drawn around the field of varying thickness. The bands in each border is analyzed in the same manner and contributes more descriptive statistic features. A pre-trained EarthNet model is also used to classify the terrain to the north, east, west and south of the field and this classification is added to the features. Next the features are clustered and then up-sampled to be more balanced.
Since a field can span multiple tiles, two outputs are prepared.
- A normal CSV file where the crop type probability prediction is averaged across tiles for each field.
- A CSV file ending in
-w.csv
where the probabilities are weighted according to the size of each field in each tile (this is the output used in the competition).
See also the full_solution/ folder, which documents the ensemble model used in the Zindi AgriFieldNet India Challenge.