Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get landsea.zarr file? #59

Open
dongZheX opened this issue Feb 6, 2023 · 8 comments
Open

How to get landsea.zarr file? #59

dongZheX opened this issue Feb 6, 2023 · 8 comments

Comments

@dongZheX
Copy link

dongZheX commented Feb 6, 2023

Thanks for the code.

I have tried to execute train/run.py, but the landsea.zarr is missed? How I get this file?

self.landsea = xr.open_zarr("/home/bieker/Downloads/landsea.zarr", consolidated=True).load()
(If it's because I'm blind, please don't mind -.-)
Thanks.

@jacobbieker
Copy link
Member

Its available here: https://huggingface.co/datasets/openclimatefix/gfs-reforecast/blob/main/data/invariant/landsea.zarr.zip, it is just the ERA5 land/sea mask, so its also available from CDS and ECMWF websites too.

@dongZheX
Copy link
Author

dongZheX commented Feb 8, 2023

Its available here: https://huggingface.co/datasets/openclimatefix/gfs-reforecast/blob/main/data/invariant/landsea.zarr.zip, it is just the ERA5 land/sea mask, so its also available from CDS and ECMWF websites too.

Thanks.

I'm a freshman in weather forecast. Thanks for the patient answer, and I am trying to reproduce Graphcast based on your code.

Now, I try to execute the train/run.py. However, I have found some issues in the code.

At present:

  • the default value of resulution for XrDataset should be "2deg", not "2.0deg", which may leads to errors about data shape.
  • I find that there are nan values in inputs, which cause that the output of model contains nan. Then, the programs raise assertion. (because the variance of "sr" = 0.0 which is used to normalize landsea data). However, I remove "sr", still crash.

[1, 1] loss: 0.144 Time: 2.1553783416748047 sec
[1, 2] loss: 0.145 Time: 1.466963529586792 sec
[1, 3] loss: 0.153 Time: 1.4628980159759521 sec
Traceback (most recent call last):
File "run.py", line 497, in
loss = criterion(outputs, labels)
File "/home/work/dongzhe05/anaconda3/envs/torch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/ssd3/dongzhe05/projects/graph_weather/graph_weather/models/losses.py", line 57, in forward
assert not torch.isnan(out).any()
AssertionError

Thanks again.

@dongZheX
Copy link
Author

dongZheX commented Feb 9, 2023

Now, I skip the data containing "NaN", and the loss keep 0.135 in epoch 2.

[2,  1626] loss: 0.135 Time: 1.461592197418213 sec
[2,  1627] loss: 0.135 Time: 1.4107253551483154 sec
[2,  1628] loss: 0.135 Time: 1.411616325378418 sec
[2,  1629] loss: 0.135 Time: 1.4023494720458984 sec
[2,  1630] loss: 0.135 Time: 1.4118192195892334 sec
[2,  1631] loss: 0.135 Time: 1.413177728652954 sec
[2,  1632] loss: 0.135 Time: 1.4135842323303223 sec
[2,  1633] loss: 0.135 Time: 1.4481475353240967 sec
[2,  1634] loss: 0.135 Time: 1.4366750717163086 sec
[2,  1635] loss: 0.135 Time: 1.4205167293548584 sec
[2,  1636] loss: 0.135 Time: 1.4060397148132324 sec
[2,  1637] loss: 0.135 Time: 1.4105403423309326 sec
[2,  1638] loss: 0.135 Time: 1.4151151180267334 sec
[2,  1639] loss: 0.135 Time: 1.412851333618164 sec
[2,  1640] loss: 0.135 Time: 1.411670207977295 sec
[2,  1641] loss: 0.135 Time: 1.4512145519256592 sec
[2,  1642] loss: 0.135 Time: 1.4370694160461426 sec
[2,  1643] loss: 0.135 Time: 1.4110925197601318 sec
[2,  1644] loss: 0.135 Time: 1.4113273620605469 sec
[2,  1645] loss: 0.135 Time: 1.4102823734283447 sec
[2,  1647] loss: 0.135 Time: 1.4005143642425537 sec
[2,  1648] loss: 0.135 Time: 1.411473274230957 sec
[2,  1649] loss: 0.135 Time: 1.4430499076843262 sec
[2,  1651] loss: 0.134 Time: 1.414628505706787 sec
[2,  1652] loss: 0.135 Time: 1.4113256931304932 sec
[2,  1653] loss: 0.135 Time: 1.396705150604248 sec
[2,  1654] loss: 0.134 Time: 1.4125947952270508 sec
[2,  1655] loss: 0.135 Time: 1.4116289615631104 sec
[2,  1656] loss: 0.135 Time: 1.4174928665161133 sec
[2,  1657] loss: 0.135 Time: 1.4478068351745605 sec
[2,  1658] loss: 0.135 Time: 1.4566729068756104 sec
[2,  1660] loss: 0.134 Time: 1.3963241577148438 sec
[2,  1661] loss: 0.134 Time: 1.4090869426727295 sec
[2,  1662] loss: 0.134 Time: 1.4048657417297363 sec
[2,  1664] loss: 0.134 Time: 1.4057867527008057 sec

@jacobbieker
Copy link
Member

Hmmm.... Yeah, I'm not sure for sure why that's the case. There could be a bug in the code, when I've tried training it its been quite slow before.

@peterdudfield
Copy link
Contributor

@all-contributors please add @dongZheX for question

@allcontributors
Copy link
Contributor

@peterdudfield

I've put up a pull request to add @dongZheX! 🎉

@Esperanto-mega
Copy link

Esperanto-mega commented Apr 19, 2023

Now, I skip the data containing "NaN", and the loss keep 0.135 in epoch 2.

[2,  1626] loss: 0.135 Time: 1.461592197418213 sec
[2,  1627] loss: 0.135 Time: 1.4107253551483154 sec
[2,  1628] loss: 0.135 Time: 1.411616325378418 sec
[2,  1629] loss: 0.135 Time: 1.4023494720458984 sec
[2,  1630] loss: 0.135 Time: 1.4118192195892334 sec
[2,  1631] loss: 0.135 Time: 1.413177728652954 sec
[2,  1632] loss: 0.135 Time: 1.4135842323303223 sec
[2,  1633] loss: 0.135 Time: 1.4481475353240967 sec
[2,  1634] loss: 0.135 Time: 1.4366750717163086 sec
[2,  1635] loss: 0.135 Time: 1.4205167293548584 sec
[2,  1636] loss: 0.135 Time: 1.4060397148132324 sec
[2,  1637] loss: 0.135 Time: 1.4105403423309326 sec
[2,  1638] loss: 0.135 Time: 1.4151151180267334 sec
[2,  1639] loss: 0.135 Time: 1.412851333618164 sec
[2,  1640] loss: 0.135 Time: 1.411670207977295 sec
[2,  1641] loss: 0.135 Time: 1.4512145519256592 sec
[2,  1642] loss: 0.135 Time: 1.4370694160461426 sec
[2,  1643] loss: 0.135 Time: 1.4110925197601318 sec
[2,  1644] loss: 0.135 Time: 1.4113273620605469 sec
[2,  1645] loss: 0.135 Time: 1.4102823734283447 sec
[2,  1647] loss: 0.135 Time: 1.4005143642425537 sec
[2,  1648] loss: 0.135 Time: 1.411473274230957 sec
[2,  1649] loss: 0.135 Time: 1.4430499076843262 sec
[2,  1651] loss: 0.134 Time: 1.414628505706787 sec
[2,  1652] loss: 0.135 Time: 1.4113256931304932 sec
[2,  1653] loss: 0.135 Time: 1.396705150604248 sec
[2,  1654] loss: 0.134 Time: 1.4125947952270508 sec
[2,  1655] loss: 0.135 Time: 1.4116289615631104 sec
[2,  1656] loss: 0.135 Time: 1.4174928665161133 sec
[2,  1657] loss: 0.135 Time: 1.4478068351745605 sec
[2,  1658] loss: 0.135 Time: 1.4566729068756104 sec
[2,  1660] loss: 0.134 Time: 1.3963241577148438 sec
[2,  1661] loss: 0.134 Time: 1.4090869426727295 sec
[2,  1662] loss: 0.134 Time: 1.4048657417297363 sec
[2,  1664] loss: 0.134 Time: 1.4057867527008057 sec

Hi, dongZheX, have you addressed the issue of 'NaN' value in the downloaded data?

@Liu990406
Copy link

Hello. The static geographic data was normalized by subtracting the mean and dividing by the standard deviation in the code. However, in const.py, LANDSEA_STD = {“sr”: 0.0,”} has a zero standard deviation for the variable sr, which causes it to approach infinity after normalization. This may result in nan values in the model prediction data. After commenting out the corresponding code, the code can run normally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants