Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider removing resampling step #13

Open
cvitolo opened this issue Apr 23, 2021 · 1 comment
Open

Consider removing resampling step #13

cvitolo opened this issue Apr 23, 2021 · 1 comment
Labels
enhancement New feature or request

Comments

@cvitolo
Copy link
Collaborator

cvitolo commented Apr 23, 2021

Raw features are in a gridded format (either GRIB or NetCDF) and are resampled to a common spatial resolution in the pre-processing step. After resampling, features are extracted at each point of interest. The machine learning models implemented use in input a table where features and outcomes are columns and each row correspond to a point (lon and lat coordinates).

Please note the resampling step introduces errors/loss of information and it's not necessarily needed in this case. Please consider inspecting each raw feature in its original spatial resolution, to minimise information loss.

@cvitolo cvitolo added the enhancement New feature or request label Apr 23, 2021
cvitolo added a commit that referenced this issue Apr 30, 2021
@cvitolo
Copy link
Collaborator Author

cvitolo commented Apr 30, 2021

This issue is solved in branch: cvitolo-patch-1.
Training script will need some modification to accept new inputs.
Will close this issue when the branch is merged into master.

fdg10371 pushed a commit that referenced this issue Jun 21, 2021
* Updated paths

* Implemented conversion of Burned Area from m2 to hectares before the fuel load is calculated in fuelload.py. This addresses issue #10.

* Added new notebook with more concise pre-processing. It includes the conversion to dataframe (model input). It solves issues #10 and #12. It also defines a new threshold for BA (50 hectares, see #11, the new threshold is defined by FDG - fire expert) but a reliable reference source is still not available.

* Notebook notebooks/preprocess_all_in_one.ipynb reformatted using black

* Formatted src/utils/fuelload.py using black

* Minor changes to notebooks/preprocess_all_in_one.ipynb

* renamed notebook and finalised concise version of the data preparation step

* Complete re-write of the data pre-processing step to avoid resampling. This addresses issue #13.

* Added the following amongst predictors: GFED4 basis regions (as categorical variable) and area of grid cell at point (as continuous variable).

* Load formula changed to BA*CC*AGB/AREA

* added log-transformed variables

* updated notebooks with latest run

* model 6h, MAE

* experiments as in ESA-D1 report

* Updated README files to clarify there are two sets of experiments (by wikilimo and ecmwf).

* Update README.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant